Browsing by Author "Hamzaoui, Raouf"
Now showing 1 - 20 of 81
Results Per Page
Sort Options
Item Open Access 3DAttGAN: A 3D attention-based generative adversarial network for joint space-time video super-resolution(IEEE, 2024-03-18) Fu, Congrui; Yuan, Hui; Shen, Liquan; Hamzaoui, Raouf; Zhang, HaoJoint space-time video super-resolution aims to increase both the spatial resolution and the frame rate of a video sequence. As a result, details become more apparent, leading to a better and more realistic viewing experience. This is particularly valuable for applications such as video streaming, video surveillance (object recognition and tracking), and digital entertainment. Over the last few years, several joint space-time video super-resolution methods have been proposed. While those built on deep learning have shown great potential, their performance still falls short. One major reason is that they heavily rely on two-dimensional (2D) convolutional networks, which restricts their capacity to effectively exploit spatio-temporal information. To address this limitation, we propose a novel generative adversarial network for joint space-time video super-resolution. The novelty of our network is twofold. First, we propose a three-dimensional (3D) attention mechanism instead of traditional two-dimensional attention mechanisms. Our generator uses 3D convolutions associated with the proposed 3D attention mechanism to process temporal and spatial information simultaneously and focus on the most important channel and spatial features. Second, we design two discriminator strategies to enhance the performance of the generator. The discriminative network uses a two-branch structure to handle the intra-frame texture details and inter-frame motion occlusions in parallel, making the generated results more accurate. Experimental results on the Vid4, Vimeo-90K, and REDS datasets demonstrate the effectiveness of the proposed method. The source code is publicly available at https://github.com/FCongRui/3DAttGan.git.Item Open Access Acoustic scene classification: from a hybrid classifier to deep learning(2017-11-16) Vafeiadis, Anastasios; Kalatzis, Dimitrios; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, RaoufThis report describes our contribution to the 2017 Detection and Classification of Acoustic Scenes and Events (DCASE) challenge. We investigated two approaches for the acoustic scene classification task. Firstly, we used a combination of features in the time and frequency domain and a hybrid Support Vector Machines - Hidden Markov Model (SVM-HMM) classifier to achieve an average accuracy over 4-folds of 80.9% on the development dataset and 61.0% on the evaluation dataset. Secondly, by exploiting dataaugmentation techniques and using the whole segment (as opposed to splitting into sub-sequences) as an input, the accuracy of our CNN system was boosted to 95.9%. However, due to the small number of kernels used for the CNN and a failure of capturing the global information of the audio signals, it achieved an accuracy of 49.5% on the evaluation dataset. Our two approaches outperformed the DCASE baseline method, which uses log-mel band energies for feature extraction and a Multi-Layer Perceptron (MLP) to achieve an average accuracy over 4-folds of 74.8%.Item Open Access Adaptive Quantization for Predicting Transform-based Point Cloud Compression(Springer, 2021-08) Wang, Xiaohui; Sun, Guoxia; Yuan, Hui; Hamzaoui, Raouf; Wang, LuThe representation of three-dimensional objects with point clouds is attracting increasing interest from researchers and practitioners. Since this representation requires a huge data volume, effective point cloud compression techniques are required. One of the most powerful solutions is the Moving Picture Experts Group geometry-based point cloud compression (G-PCC) emerging standard. In the G-PCC lifting transform coding technique, an adaptive quantization method is used to improve the coding efficiency. Instead of assigning the same quantization step size to all points, the quantization step size is in-creased according to level of detail traversal order. In this way, the attributes of more important points receive a finer quantization and have a smaller quantization error than the attributes of less important ones. In this paper, we adapt this approach to the G-PCC predicting transform and propose a hardware-friendly weighting method for the adaptive quantization. Experimental results show that compared to the current G-PCC test model, the proposed method can achieve an average Bjøntegaard delta rate of -6.7%, -14.7%, -15.4%, and -10.0% for the luma, chroma Cb, chroma Cr, and reflectance components, respectively on the MPEG Cat1-A, Cat1-B, Cat3-fused and Cat3-frame datasets.Item Open Access Adaptive unicast video streaming with rateless codes and feedback.(IEEE, 2010-02) Ahmad, Shakeel; Hamzaoui, Raouf; Al-Akaidi, Marwan, 1959-Video streaming over the Internet and packet-based wireless networks is sensitive to packet loss, which can severely damage the quality of the received video. To protect the transmitted video data against packet loss, application-layer forward error correction (FEC) is commonly used. Typically, for a given source block, the channel code rate is fixed in advance according to an estimation of the packet loss rate. However, since network conditions are difficult to predict, determining the right amount of redundancy introduced by the channel encoder is not obvious. To address this problem, we consider a general framework where the sender applies rateless erasure coding to every source block and keeps on transmitting the encoded symbols until it receives an acknowledgment from the receiver indicating that the block was decoded successfully. Within this framework, we design transmission strategies that aim at minimizing the expected bandwidth usage while ensuring successful decoding subject to an upper bound on the packet loss rate. In real simulations over the Internet, our solution outperformed standard FEC and hybrid ARQ approaches. For the QCIF Foreman sequence compressed with the H.264 video coder, the gain in average peak signal to noise ratio over the best previous scheme exceeded 3.5 decibels at 90 kilobits per second.Item Open Access Audio Content Analysis for Unobtrusive Event Detection in Smart Homes(Elsevier, 2019) Vafeiadis, Anastasios; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, RaoufEnvironmental sound signals are multi-source, heterogeneous, and varying in time. Many systems have been proposed to process such signals for event detection in ambient assisted living applications. Typically, these systems use feature extraction, selection, and classification. However, despite major advances, several important questions remain unanswered, especially in real-world settings. This paper contributes to the body of knowledge in the field by addressing the following problems for ambient sounds recorded in various real-world kitchen environments: 1) which features and which classifiers are most suitable in the presence of background noise? 2) what is the effect of signal duration on recognition accuracy? 3) how do the signal-to-noise-ratio and the distance between the microphone and the audio source affect the recognition accuracy in an environment in which the system was not trained? We show that for systems that use traditional classifiers, it is beneficial to combine gammatone frequency cepstral coefficients and discrete wavelet transform coefficients and to use a gradient boosting classifier. For systems based on deep learning, we consider 1D and 2D Convolutional Neural Networks (CNN) using mel-spectrogram energies and mel-spectrograms images, as inputs, respectively and show that the 2D CNN outperforms the 1D CNN. We obtained competitive classification results for two such systems. The first one, which uses a gradient boosting classifier, achieved an F1-Score of 90.2% and a recognition accuracy of 91.7%. The second one, which uses a 2D CNN with mel-spectrogram images, achieved an F1-Score of 92.7% and a recognition accuracy of 96%.Item Open Access Audio-based Event Recognition System for Smart Homes(IEEE, 2017) Vafeiadis, Anastasios; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, RaoufBuilding an acoustic-based event recognition system for smart homes is a challenging task due to the lack of high-level structures in environmental sounds. In particular, the selection of effective features is still an open problem. We make an important step toward this goal by showing that the combination of Mel-Frequency Cepstral Coefficients, Zero- Crossing Rate, and Discrete Wavelet Transform features can achieve an F1 score of 96.5% and a recognition accuracy of 97.8% with a gradient boosting classifier for ambient sounds recorded in a kitchen environment.Item Metadata only Bayesian Early Mode Decision Technique for View Synthesis Prediction-Enhanced Multiview Video Coding(IEEE, 2013-11) Khattak, Shadan; Hamzaoui, Raouf; Maugey, Thomas; Ahmad, Shakeel; Frossard, PascalView synthesis prediction (VSP) is a codingmode that predicts video blocks from synthesised frames. It is particularly useful in a multi-camera setup with large inter-camera distances. Adding a VSP-based SKIP mode to a standard Multiview Video Coding (MVC) framework improves the rate-distortion (RD) performance but increases the time complexity of the encoder. This letter proposes an earlymode decision technique for VSP SKIP-enhanced MVC. Our method uses the correlation between the RD costs of the VSP SKIP mode in neighbouring views and Bayesian decision theory to reduce the number of candidate coding modes for a given macroblock. Simulation results showed that our technique can save up to 36.20% of the encoding time without any significant loss in RD performance.Item Metadata only Branch and bound algorithms for rate-distortion optimized media streaming(IEEE, 2006-02-01) Hamzaoui, Raouf; Cardinal, J.; Röder, M.We consider the problem of rate-distortion optimized streaming of packetized multimedia data over a single quality-of-service network using feedback and retransmissions. For a single data unit, we prove that the problem is NP-hard and provide efficient branch and bound algorithms that are much faster than the previously best solution based on dynamic programming. For a group of interdependent data units, we show how to compute optimal solutions with branch and bound algorithms. The branch and bound algorithms for a group of data units are much slower than the current state of the art, a heuristic technique known as sensitivity adaptation. However, in many real-world situations, they provide a significantly better rate-distortion performance.Item Open Access CAS-NET: Cascade attention-based sampling neural network for point cloud simplification(IEEE, 2023-07) Chen, Chen; Yuan, Hui; Liu, Hao; Hou, Junhui; Hamzaoui, RaoufPoint cloud sampling can reduce storage requirements and computation costs for various vision tasks. Traditional sampling methods, such as farthest point sampling, are not geared towards downstream tasks and may fail on such tasks. In this paper, we propose a cascade attention-based sampling network (CAS-Net), which is end-to-end trainable. Specifically, we propose an attention-based sampling module (ASM) to capture the semantic features and preserve the geometry of the original point cloud. Experimental results on the ModelNet40 dataset show that CAS-Net outperforms state-of-the-art methods in a sampling-based point cloud classification task, while preserving the geometric structure of the sampled point cloud.Item Embargo Coarse to fine rate control for region-based 3D point cloud compression(IEEE, 2020-06-09) Liu, Qi; Yuan, Hui; Hamzaoui, Raouf; Su, HongleiWe modify the video-based point cloud compression standard (V-PCC) by mapping the patches to seven regions and encoding the geometry and color video sequences of each region. We then propose a coarse to fine rate control algorithm for this scheme. The algorithm consists of two major steps. First, we allocate the target bitrate between the geometry and color information. Then, we optimize in turn the geometry and color quantization steps for the video sequences of each region using analytical models for the rate and distortion. Experimental results for eight point clouds showed that the average percent bitrate error of our algorithm is only 3.7%, and its perceptual reconstruction quality is better than that of V-PCC.Item Open Access Coder Source Code(2021-10) Yuan, Hui; Hamzaoui, Raouf; Neri, Ferrante; Yang, ShengxiangPoint clouds are representations of three-dimensional (3D) objects in the form of a sample of points on their surface. Point clouds are receiving increased attention from academia and industry due to their potential for many important applications, such as real-time 3D immersive telepresence, automotive and robotic navigation, as well as medical imaging. Compared to traditional video technology, point cloud systems allow free viewpoint rendering, as well as mixing of natural and synthetic objects. However, this improved user experience comes at the cost of increased storage and bandwidth requirements as point clouds are typically represented by the geometry and colour (texture) of millions up to billions of 3D points. For this reason, major efforts are being made to develop efficient point cloud compression schemes. However, the task is very challenging, especially for dynamic point clouds (sequences of point clouds), due to the irregular structure of point clouds (the number of 3D points may change from frame to frame, and the points within each frame are not uniformly distributed in 3D space). To standardize point cloud compression (PCC) technologies, the Moving Picture Experts Group (MPEG) launched a call for proposals in 2017. As a result, three point cloud compression technologies were developed: surface point cloud compression (S-PCC) for static point cloud data, video-based point cloud compression (V-PCC) for dynamic content, and LIDAR point cloud compression (L-PCC) for dynamically acquired point clouds. Later, L-PCC and S-PCC were merged under the name geometry-based point cloud compression (G-PCC). The aim of the OPT-PCC project is to develop algorithms that optimise the rate-distortion performance [i.e., minimize the reconstruction error (distortion) for a given bit budget] of V-PCC. The objectives of the project are to: 1. O1: build analytical models that accurately describe the effect of the geometry and colour quantization of a point cloud on the bit rate and distortion; 2. O2: use O1 to develop fast search algorithms that optimise the allocation of the available bit budget between the geometry information and colour information; 3. O3: implement a compression scheme for dynamic point clouds that exploits O2 to outperform the state-of-the-art in terms of rate-distortion performance. The target is to reduce the bit rate by at least 20% for the same reconstruction quality; 4. O4: provide multi-disciplinary training to the researcher in algorithm design, metaheuristic optimisation, computer graphics, media production, and leadership and management skills. As part of O3, this deliverable gives the source code of the algorithms used in the project to optimize the rate-distortion performance of V-PCC.Item Embargo Colored Point Cloud Quality Assessment Using Complementary Features in 3D and 2D Spaces(IEEE, 2024-08-14) Cui, Mao; Zhang, Yun; Fan, Chunling; Hamzaoui, Raouf; Li, QinglanPoint Cloud Quality Assessment (PCQA) plays an essential role in optimizing point cloud acquisition, encoding, transmission, and rendering for human-centric visual media applications. In this paper, we propose an objective PCQA model using Complementary Features from 3D and 2D spaces, called CF-PCQA, to measure the visual quality of colored point clouds. First, we develop four effective features in 3D space to represent the perceptual properties of colored point clouds, which include curvature, kurtosis, luminance distance and hue features of points in 3D space. Second, we project the 3D point cloud onto 2D planes using patch projection and extract a structural similarity feature of the projected 2D images in the spatial domain, as well as a sub-band similarity feature in the wavelet domain. Finally, we propose a feature selection and a learning model to fuse high dimensional features and predict the visual quality of the colored point clouds. Extensive experimental results show that the Pearson Linear Correlation Coefficients (PLCCs) of the proposed CF-PCQA were 0.9117, 0.9005, 0.9340 and 0.9826 on the SIAT-PCQD, SJTU-PCQA, WPC2.0 and ICIP2020 datasets, respectively. Moreover, statistical significance tests demonstrate that the CF-PCQA significantly outperforms the state-of-the-art PCQA benchmark schemes on the four datasets.Item Open Access The community network game project: Enriching online gamers experience with user generated content.(IARIA, 2010-11) Ahmad, Shakeel; Bouras, Christos; Hamzaoui, Raouf; Papazois, Andreas; Perelman, Erez; Shani, Alex; Simon, Gwendal; Tsichritzis, GeorgeOne of the most attractive features of Massively Multiplayer Online Games (MMOGs) is the possibility for users to interact with a large number of other users in a variety of collaborative and competitive situations. Gamers within an MMOG become members of active communities with common interests and shared adventures. The EU-funded Community Network Game (CNG) project will provide MMOG players new tools for the generation, distribution, and insertion of user generated content (UGC) without changing the game code and without adding new processing or network loads to the MMOG central servers. The UGC considered by the CNG project includes 3D objects and graphics as well as video. We present the objectives of the project, focusing on its main scientific and technological contributions.Item Metadata only Community tools for massively multiplayer online games(2011-11) Ahmad, Shakeel; Bouras, Christos; Hamzaoui, Raouf; Liu, Jiayi; Papazois, Andreas; Perelman, Erez; Shani, Alex; Simon, Gwendal; Tsichritzis, GeorgeOne of the most attractive features of Massively Multiplayer Online Games (MMOGs) is the possibility for users to interact with a large number of other users in a variety of collaborative and competitive situations. Gamers within an MMOG typically become members of active communities with mutual interests, shared adventures, and common objectives. We present the EU funded Community Network Game (CNG) project. The CNG project provides tools to enhance collaborative activities between online gamers and offers new tools for the generation, distribution and insertion of user-generated content in MMOGs. CNG allows the addition of new engaging community services without changing the game code and without adding new processing or network loads to the MMOG central servers. The user-generated content considered by the CNG project includes 3D objects and graphics, as well as screen-captured live video of the game, which is shared using peer-to-peer technology. We survey the state of the art in all areas related to the project and present its concept, objectives, and innovations.Item Open Access Comparing CNN and Human Crafted Features for Human Activity Recognition(IEEE, 2019-08) Cruciani, Federico; Vafeiadis, Anastasios; Nugent, Chris; Cleland, Ian; McCullagh, Paul; Votis, Konstantinos; Giakoumis, Dimitrios; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, RaoufDeep learning techniques such as Convolutional Neural Networks (CNNs) have shown good results in activity recognition. One of the advantages of using these methods resides in their ability to generate features automatically. This ability greatly simplifies the task of feature extraction that usually requires domain specific knowledge, especially when using big data where data driven approaches can lead to anti-patterns. Despite the advantage of this approach, very little work has been undertaken on analyzing the quality of extracted features, and more specifically on how model architecture and parameters affect the ability of those features to separate activity classes in the final feature space. This work focuses on identifying the optimal parameters for recognition of simple activities applying this approach on both signals from inertial and audio sensors. The paper provides the following contributions: (i) a comparison of automatically extracted CNN features with gold standard Human Crafted Features (HCF) is given, (ii) a comprehensive analysis on how architecture and model parameters affect separation of target classes in the feature space. Results are evaluated using publicly available datasets. In particular, we achieved a 93.38% F-Score on the UCI-HAR dataset, using 1D CNNs with 3 convolutional layers and 32 kernel size, and a 90.5% F-Score on the DCASE 2017 development dataset, simplified for three classes (indoor, outdoor and vehicle), using 2D CNNs with 2 convolutional layers and a 2x2 kernel size.Item Open Access Crowdsourced Estimation of Collective Just Noticeable Difference for Compressed Video with the Flicker Test and QUEST+(IEEE, 2024-05-17) Jenadeleh, Mohsen; Hamzaoui, Raouf; Reips, Ulf-Dietrich; Saupe, DietmarThe concept of videowise just noticeable difference (JND) was recently proposed for determining the lowest bitrate at which a source video can be compressed without perceptible quality loss with a given probability. This bitrate is usually obtained from estimates of the satisfied used ratio (SUR) at different encoding quality parameters. The SUR is the probability that the distortion corresponding to the quality parameter is not noticeable. Commonly, the SUR is computed experimentally by estimating the subjective JND threshold of each subject using a binary search, fitting a distribution model to the collected data, and creating the complementary cumulative distribution function of the distribution. The subjective tests consist of paired comparisons between the source video and compressed versions. However, as shown in this paper, this approach typically overestimates or underestimates the SUR. To address this shortcoming, we directly estimate the SUR function by considering the entire population as a collective observer. In our method, the subject for each paired comparison is randomly chosen, and a state-of-the-art Bayesian adaptive psychometric method (QUEST+) is used to select the compressed video in the paired comparison. Our simulations show that this collective method yields more accurate SUR results using fewer comparisons than traditional methods. We also perform a subjective experiment to assess the JND and SUR for compressed video. In the paired comparisons, we apply a flicker test that compares a video interleaving the source video and its compressed version with the source video. Analysis of the subjective data reveals that the flicker test provides, on average, greater sensitivity and precision in the assessment of the JND threshold than does the usual test, which compares compressed versions with the source video. Using crowdsourcing and the proposed approach, we build a JND dataset for 45 source video sequences that are encoded with both advanced video coding (AVC) and versatile video coding (VVC) at all available quantization parameters. Our dataset and the source code have been made publicly available at http://database.mmsp-kn.de/flickervidset-database.html.Item Open Access Dependence-Based Coarse-to-Fine Approach for Reducing Distortion Accumulation in G-PCC Attribute Compression(IEEE, 2024-06-05) Guo, Tian; Yuan, Hui; Hamzaoui, Raouf; Wang, Xiaohui; Wang, LuGeometry-based point cloud compression (G-PCC) is a state-of-the-art point cloud compression standard. While G-PCC achieves excellent performance, its reliance on the predicting transform leads to a significant dependence problem, which can easily result in distortion accumulation. This not only increases bitrate consumption but also degrades reconstruction quality. To address these challenges, we propose a dependence-based coarse-to-fine approach for distortion accumulation in G-PCC attribute compression. Our method consists of three modules: level-based adaptive quantization, point-based adaptive quantization, and Wiener filter-based refinement level quality enhancement. The level-based adaptive quantization module addresses the interlevel-of-detail (LOD) dependence problem, while the point-based adaptive quantization module tackles the interpoint dependence problem. On the other hand, the Wiener filter-based refinement level quality enhancement module enhances the reconstruction quality of each point based on the dependence order among LODs. Extensive experimental results demonstrate the effectiveness of the proposed method. Notably, when the proposed method was implemented in the latest G-PCC test model (TMC13v23.0), a Bjφntegaard delta rate of −4.9%, −12.7%, and −14.0% was achieved for the Luma, Chroma Cb, and Chroma Cr components, respectively.Item Metadata only Efficient rate-distortion optimized media streaming for tree-structured packet dependencies(IEEE, 2007-10-01) Hamzaoui, Raouf; Cardinal, J.; Röder, M.When streaming packetized media data over a lossy packet network, it is desirable to use transmission strategies that minimize the expected distortion subject to a constraint on the expected transmission rate. Because the computation of such optimal strategies is usually an intractable problem, fast heuristic techniques are often used. We first show that when the graph that gives the decoding dependencies between the data packets is reducible to a tree, optimal transmission strategies can be efficiently computed with dynamic programming algorithms. The proposed algorithms are much faster than other exact algorithms developed for arbitrary dependency graphs. They are slower than previous heuristic techniques but can provide much better solutions. We also show how to apply our algorithms to find high-quality approximate solutions when the dependency graph is not tree reducible. To validate our approach, we run simulations for MPEG1 and H.264 video data. We first consider a simulated packet erasure channel. Then we implement a real video streaming system and provide experimental results for an Internet connection.Item Open Access Energy-based decision engine for household human activity recognition(IEEE, 2018-03) Vafeiadis, Anastasios; Vafeiadis, Thanasis; Zikos, Stelios; Krinidis, Stelios; Votis, Konstantinos; Giakoumis, Dimitrios; Ioannidis, Dimosthenis; Tzovaras, Dimitrios; Chen, Liming; Hamzaoui, RaoufWe propose a framework for energy-based human activity recognition in a household environment. We apply machine learning techniques to infer the state of household appliances from their energy consumption data and use rulebased scenarios that exploit these states to detect human activity. Our decision engine achieved a 99.1% accuracy for real-world data collected in the kitchens of two smart homes.Item Open Access Enhanced Collision Resolution and Throughput Analysis for the 802.11 Distributed Coordination Function(Wiley, 2021-08) Kobbaey, Thaeer; Hamzaoui, Raouf; Ahmad, Shakeel; Al-Fayoumi, Mustafa; Thomos, NikolaosThe IEEE 802 standards rely on the distributed coordination function (DCF) as the fundamental medium access control method. DCF uses the binary exponential backoff (BEB) algorithm to regulate channel access. The backoff time determined by BEB depends on a contention window (CW) whose size is doubled if a station suffers a collision and reset to its minimum value after a successful transmission. Doubling the size of CW reduces channel access time, which decreases the throughput.Resetting it to its minimum value harms fairness since the station will have a better chance of accessing the channel compared to stations that suffered a collision. We propose an algorithm that addresses collisions without instantly increasing the CW size. Our algorithm aims to reduce the collision probability without affecting the channel access time and delay. We present extensive simulations for fixed and mobile scenarios. The results show that, on average, our algorithm outperforms BEB in terms of throughput and fairness. Compared to exponential increase exponential decrease (EIED), our algorithm improves, on average, throughput and delay performance. We also propose analytical models for BEB, EIED, and our algorithm. Our models extend Bianchi’s popular Markov chain-based model by using a collision probability that is dependent on the station transmission history. Our models provide a better estimation of the probability that a station transmits in a random slot time, which allows a more accurate throughput analysis. Using our models, we show that both the saturation throughput and maximum throughput of our algorithm are higher than those of BEB and EIED.