Browsing by Author "Lu, Xin"
Now showing 1 - 20 of 24
Results Per Page
Sort Options
Item Embargo 3D-MSFC: A 3D multi-scale features compression method for object detection(Elsevier, 2024-11-17) Li, Zhengxin; Tian, Chongzhen; Yuan, Hui; Lu, Xin; Malekmohamadi, HosseinAs machine vision tasks rapidly evolve, a new concept of compression, namely video coding for machines (VCM), has emerged. However, current VCM methods are only suitable for 2D machine vision tasks. With the popularization of autonomous driving, the demand for 3D machine vision tasks has significantly increased, leading to an explosive growth in LiDAR data that requires efficient transmission. To address this need, we propose a machine vision-based point cloud coding paradigm inspired by VCM. Specifically, we introduce a 3D multi-scale features compression (3D-MSFC) method, tailored for 3D object detection. Experimental results demonstrate that 3D-MSFC achieves less than a 3% degradation in object detection accuracy at a compression ratio of 2796×. Furthermore, its low-profile variant, 3D-MSFC-L, achieves less than a 2% degradation in accuracy at a compression ratio of 463×. The above results indicate that our proposed method can provide an ultra-high compression ratio while ensuring no significant drop in accuracy, greatly reducing the amount of data required for transmission during each detection. This can significantly lower bandwidth consumption and save substantial costs in application scenarios such as smart cities.Item Open Access Biopsy Needle Segmentation using Deep Networks on inhomogeneous Ultrasound Images(2022-07-11) Zhao, Yue; Lu, Yi; Lu, Xin; Jing, J.; Tao, L.; Chen, X.In minimally invasive interventional surgery, ultrasound imaging is usually used to provide real-time feedback in order to obtain the best diagnostic results or realize treatment plans, so how to accurately obtain the position of the medical biopsy needle is a problem worthy of study. 2D ultrasound simulation images containing the medical biopsy needle are generated, and our images background is from the real breast ultrasound image. Based on the deep learning network, the images containing the medical biopsy needle are used to analyze the effectiveness of different networks for needle localization for the purpose of returning needle positions in non-uniform ultrasound images. The results show that attention U-Net performed best and can accurately reflect the real position of the medical biopsy needle. The IoU and Precision can reach 90.19% and 96.25%, and the Angular Error is 0.40°.Item Open Access Cross-correlation Full Waveform Inversion for Sound Speed Reconstruction in Ultrasound Computed Tomography(2022-07-11) Zhao, Yue; Zhang, Nuomin; Lu, Xin; Yuan, Yu; Shen, YiUltrasound computed tomography (USCT) is considered to have great potential for breast cancer screening. Compared with the ray based methods, the reconstructed image using full waveform inversion (FWI) methods have higher spatial resolution. However, the results of FWI is difficult to converge to the real value when cycle skipping occurs. In this paper, a cross-correlation full waveform inversion(CC-FWI) is proposed for USCT image reconstruction. In the first s tage, the ajoint source is adjusted as the residual of predicted signal and time-shifted measured signal to avoid cycle skipping. In the remaining stage, the FWI with source encoding is employed to accelerate convergence. The simulations are conducted to demonstrate the validity of the proposed algorithm. The root mean squared error (RMSE) of the proposed algorithm is much smaller than that of conventional FWI. The results suggest that CC-FWI is effective in avoiding cycle skipping.Item Open Access Deep Low-Rank and Sparse Patch-Image Network for Infrared Dim and Small Target Detection(IEEE, 2023) Zhou, Xinyu; Li, Peng; Zhang, Ye; Lu, Xin; Hu, YueDetection of infrared dim and small targets with diverse and cluttered background plays a significant role in many applications. In this paper, we propose a deep low- rank and sparse patch-image network, termed as Deep-LSP-Net, to effectively detect small targets in a single infrared image. Specifically, by using the local patch construction scheme, we first transform the original infrared image into a patch-image, which can be decomposed as a superposition of the low-rank background component and the sparse target component. The target detection is thus formulated as an optimization problem with low-rank and sparse regularizations, which can be solved by the alternating direction method of multipliers (ADMM). We unroll the iterative algorithm into deep neural networks, where a generalized sparsifying transform and a singular value thresholding operator are learned by the convolutional neural networks (CNNs) to avoid tedious parameter tuning and improve the interpretability of the neural networks. We conduct compre- hensive experiments on two public datasets. Both qualitative and quantitative experimental results demonstrate that the proposed algorithm can obtain improved performance in small infrared target detection compared with state-of-the-art algorithms.Item Open Access Deep Magnetic Resonance Fingerprinting Based on local and global vision transformer(ISMRM & ISMRT, 2023-02-13) Li, Peng; Li, Xiaodi; Lu, Xin; Hu, YueMagnetic resonance fingerprinting (MRF) can achieve simultaneous imaging of multiple tissue parameters. However, the size of the tissue fingerprint dictionary used in MRF grows exponentially as the number of tissue parameters increases, which may result in prohibitively large dictionaries that require extensive computational resources. Existing CNN-based methods obtain parameter reconstruction patch-wisely, using only local information and resulting in limited reconstruction speed. In this paper, we propose a novel end-to-end local and global vision transformer(LG-VIT) for MRF parameter reconstruction. The proposed method enables significantly fast and accurate end-to-end parameter reconstruction while avoiding the high computational cost of high-dimensional data.Item Open Access Dynamic MRI Reconstruction Combining Tensor Nuclear Norm and Casorati Matrix Nuclear Norm(2022-05-07) Zhang, Yinghao; Hu, Yue; Lu, XinLow-rank tensor models have been applied in accelerating dynamic magnetic resonance imaging (dMRI). Recently, a new tensor nuclear norm based on t-SVD has been proposed and applied to tensor completion. Inspired by the different properties of the tensor nuclear norm (TNN) and the Casorati matrix nuclear norm (MNN), we introduce a novel dMRI reconstruction method combining TNN and Casorati MNN, which we term as TMNN. Moreover, we convert the the TMNN dMRI reconstruction problem into a simple tensor completion problem, which can be efficiently solved by the alternating direction method of multipliers (ADMM).Item Open Access Editorial: Security, governance, and challenges of the new generation of cyber-physical-social systems(Frontiers, 2024-08-15) Huang, Yuanyuan; Lu, XinItem Embargo Enhancing Context Models for Point Cloud Geometry Compression with Context Feature Residuals and Multi-Loss(IEEE, 2024-02-20) Sun, Chang; Yuan, Hui; Li, Shuai; Lu, Xin; Hamzaoui, RaoufIn point cloud geometry compression, context models usually use the one-hot encoding of node occupancy as the label, and the cross-entropy between the one-hot encoding and the probability distribution predicted by the context model as the loss function. However, this approach has two main weaknesses. First, the differences between contexts of different nodes are not significant, making it difficult for the context model to accurately predict the probability distribution of node occupancy. Second, as the one-hot encoding is not the actual probability distribution of node occupancy, the cross-entropy loss function is inaccurate. To address these problems, we propose a general structure that can enhance existing context models. We introduce the context feature residuals into the context model to amplify the differences between contexts. We also add a multi-layer perception branch, that uses the mean squared error between its output and node occupancy as a loss function to provide accurate gradients in backpropagation. We validate our method by showing that it can improve the performance of an octreebased model (OctAttention) and a voxel-based model (VoxelDNN) on the object point cloud datasets MPEG 8i and MVUB, as well as the LiDAR point cloud dataset SemanticKITTI.Item Embargo Enhancing Octree-based Context Models for Point Cloud Geometry Compression with Attention-based Child Node Number Prediction(IEEE, 2024-07-12) Sun, Chang; Yuan, Hui; Mao, Xiaolong; Lu, Xin; Hamzaoui, RaoufIn point cloud geometry compression, most octree-based context models use the cross-entropy between the one-hot encoding of node occupancy and the probability distribution predicted by the context model as the loss. This approach converts the problem of predicting the number (a regression problem) and the position (a classification problem) of occupied child nodes into a 255-dimensional classification problem. As a result, it fails to accurately measure the difference between the one-hot encoding and the predicted probability distribution. We first analyze why the cross-entropy loss function fails to accurately measure the difference between the one-hot encoding and the predicted probability distribution. Then, we propose an attention-based child node number prediction (ACNP) module to enhance the context models. The proposed module can predict the number of occupied child nodes and map it into an 8-dimensional vector to assist the context model in predicting the probability distribution of the occupancy of the current node for efficient entropy coding. Experimental results demonstrate that the proposed module enhances the coding efficiency of octree-based context models.Item Open Access Evaluating the Impact of Point Cloud Downsampling on the Robustness of LiDAR-based Object Detection(2024-04-10) Golarits, Marcell; Rosza, Zoltan; Hamzaoui, Raouf; Allidina, Tanvir; Lu, Xin; Sziranyi, TamasLiDAR-based 3D object detection relies on the relatively rich information captured by LiDAR point clouds. However, computational efficiency often requires the downsampling of these point clouds. This paper studies the impact of downsampling strategies on the robustness of a state-of-the-art object detector, namely PointPillars. We compare the performance of the approach under random sampling and farthest point sampling, evaluating the model’s accuracy in detecting objects across various downsampling ratios. The experiments were conducted on the popular KITTI dataset.Item Metadata only Fast Coding Mode Prediction for Intra Prediction in VVC SCC(2024 IEEE International Conference on Image Processing (ICIP 2024), 2024-06-06) Wang, Dayong; Yu, Junyi; Lu, Xin; Dufaux, Frederic; Guo, Hongwei; Guo, Hui; Zhu, CeCurrently, screen content video applications are increasingly widespread in our daily lives. The latest Screen Content Coding (SCC) standard, known as Versatile Video Coding (VVC) SCC, employs screen content Coding Modes (CMs) selection. While VVC SCC achieves high coding efficiency, its coding complexity poses a significant obstacle to the further widespread adoption of screen content video. Hence, it is crucial to enhance the coding speed of VVC SCC. In this paper, we propose a fast mode and splitting decision for Intra prediction in VVC SCC. Specifically, we initially exploit deep learning techniques to predict content types for all CUs. Subsequently, we examine CM distributions of different content types to predict candidate CMs for CUs. We then introduce early skip and early terminate CM decisions for different content types of CUs to further eliminate unlikely CMs. Finally, we develop Block-based Differential Pulse-Code Modulation (BDPCM) early termination to improve coding speed. Experimental results demonstrate that the proposed algorithm can improve coding speed by 34.95% on average while maintaining almost the same coding efficiency.Item Open Access Fast Coding Unit Partition Decision for Intra Prediction in Versatile Video Coding(Springer, 2021-09-30) Zhang, Menglu; Chen, Yushi; Lu, Xin; Zhang, Ye; Chen, HaoIn recent years, the state-of-the-art video coding standard - Versatile Video Coding (VVC) has been widely investigated. VVC achieves impressive performance by adopting more flexible partitioning method compared to its predecessor High Efficiency Video Coding (HEVC). However, the superior performance is realized at the expense of huge time consumption and increasing hardware costs, which obstructs its applications in real-time scenarios. To ad- dress this problem, we present a fast implementation for the decision process of the nested multi-type tree (QTMT) partitioning, and it significantly reduces the run-time of encoder while maintaining almost the same coding performance. Firstly, the inherent texture property of source frame is utilized to identify the prediction depth for Coding Tree Unit (CTU). Then, the spatial correlation is used to further narrow the depth range down. Finally, we skip unnecessary par- tition types according to the predicted Coding Unit (CU) depth, which is deter- mined by the above predicted CTU depth and adjacent CU’s depth together. Experimental results demonstrate the effectiveness of our proposed method in VVC Test Model (VTM). Compared with the original implementation of the VTM4.0 anchor, the proposed algorithm achieves an average of 49.01% encoding time savings, accompanied by only an increase of 2.18% in Bj ntegaard delta Bitrate (BDBR) and a loss of 0.138dB in Bjontegaard delta PSNR (BDPSNR).Item Embargo Fast intra mode prediction algorithms for SCBs in VVC SCC(IEEE, 2024-04-01) Wang, D.; Deng, Y.; Li, W.; Lu, Xin; Dufaux, F.; Hang, B.; Zhu, C.Versatile Video Coding (VVC) now supports Screen Content Coding (SCC) with the introduction of two new coding modes: Intra Block Copy (IBC) and Palette (PLT). However, the numerous modes and the Quad-Tree Plus Multi-Type Tree (QTMT) structure inherent to VVC contribute to a very high coding complexity. To effectively reduce the computational complexity of VVC SCC, we propose a fast Intra mode prediction algorithm for VVC SCC. More specifically, we first use the different minimum Sum of Absolute Transformed Differences (SATD)value of four Directional Modes (DMs)of Intra and the SATD value of the IBC-merge mode to determine whether to early skip Intra. Subsequently, we use a decision tree to determine whether to early terminate block differential pulse coded modulation (BDPCM) checking. Finally, we employ a decision tree to determine whether to early skip multiple transform selection (MTS) and low frequency non-separable transform (LFNST) checking. The results demonstrate that our algorithm achieves an average encoding time reduction of 34.34% with a negligible increase of only 0.46% in BDBR.Item Open Access FAST LEARNING-BASED SPLIT TYPE PREDICTION ALGORITHM FOR VVC(IEEE, 2023-10) Wang, Dayong; Chen, Liulin; Lu, Xin; Dufaux, Frederic; Li, Weisheng; Zhu, CeAs the latest video coding standard, Versatile Video Coding (VVC) is highly efficient at the cost of very high coding com- plexity, which seriously hinders its widespread application. Therefore, it is very crucial to improve its coding speed. In this paper, we propose a learning-based fast split type (ST) prediction algorithm for VVC using a deep learning approach. We first construct a large-scale database containing sufficient STs with diverse video resolution and content. Next, since the ST distributions of coding units (CUs) of different sizes are significantly distinct, so we separately design neural net- works for all different CU sizes. Then, we merge ambiguous STs into four merged classes (MCs) to train models to obtain probabilities of MCs and skip unlikely ones. Experimental results demonstrate that the proposed algorithm can reduce the encoding time of VVC by 67.53% with 1.89% increase in Bjøntegaard delta bit-rate (BDBR) on average.Item Open Access Fast Mode and CU Splitting Decision for Intra Prediction in VVC SCC(IEEE, 2024-04-17) Wang, Dayong; Yu, Junyi; Lu, Xin; Dufaux, Frederic; Hang, Bo; Guo, Hui; Zhu, CeCurrently, screen content video applications are increasingly widespread in our daily lives. The latest Screen Content Coding (SCC) standard, known as Versatile Video Coding (VVC) SCC, employs a quad-tree plus multi-type tree (QTMT) coding structure for Coding Unit (CU) partitioning and screen content Coding Modes (CMs) selection. While VVC SCC achieves high coding efficiency, its coding complexity poses a significant obstacle to the further widespread adoption of screen content video. Hence, it is crucial to enhance the coding speed of VVC SCC. In this paper, we propose a fast mode and splitting decision for Intra prediction in VVC SCC. Specifically, we initially exploit deep learning techniques to predict content types for all CUs. Subsequently, we examine CM distributions of different content types to predict candidate CMs for CUs. We then introduce early skip and early terminate CM decisions for different content types of CUs to further eliminate unlikely CMs. Finally, we develop Block-based Differential Pulse- Code Modulation (BDPCM) early termination and CU splitting early termination to improve coding speed. Experimental results demonstrate that the proposed algorithm improves coding speed on average by 41.14%, with the BDBR increasing by 1.17%.Item Open Access Gaussian Distribution-Based Mode Selection for Intra Prediction Of Spatial SHVC(The 29th IEEE International Conference on Image Processing (IEEE ICIP), 2022-06-20) Wang, Dayong; Wang, Xin; Sun, Yu; Weisheng, Li; Lu, Xin; Dufaux, FredericDue to the diversity of terminal devices, Spatial Scalable High Efficiency Video Coding (SSHVC) is an efficient solution to meet this requirement. However, its coding process is very complex, which seriously prevents its wide applications. Therefore, it is very crucial to reduce coding complexity and improve coding speed. In this paper, we propose a Gaussian Distribution-based Mode Selection for Intra Prediction of SSHVC. We show that the rate distortion costs of Inter-layer Reference (ILR) mode and Intra mode are significantly different, and both follow a Gaussian distribution. Based on this discovery, we propose to use a Bayes decision rule to determine whether ILR is the best mode so as to skip Intra mode. Experimental results demonstrate that the proposed algorithm can significantly improve coding speed with negligible coding efficiency losses.Item Open Access Hybrid strategies for efficient intra prediction in spatial SHVC(IEEE, 2022-11-28) Wang, Dayong; Sun, Yu; Lu, Xin; Li, Weisheng; Lele, Xie; Zhu, CeWith multi-layer encoding and Inter-layer prediction, Spatial Scalable High Efficiency Video Coding (SSHVC) has extremely high coding complexity. It is very crucial to speed up its coding to promote widespread and cost-effective SSHVC applications. Specifically, we first reveal that the average RD cost of Inter-layer Reference (ILR) mode is different from that of Intra mode, but they both follow the Gaussian distribution. Based on this discovery, we apply the classic Gaussian Mixture Model and Expectation Maximization to determine whether ILR mode is the best mode thus skipping Intra mode. Second, when coding units (CUs) in enhancement layer use Intra mode, it indicates very simple texture is presented. We investigate their Directional Mode (DM) distribution, and divide all DMs into three classes, and then develop different methods with respect to classes to progressively predict the best DMs. Third, by jointly considering rate distortion costs, residual coefficients and neighboring CUs, we propose to employ the Conditional Random Fields model to early terminate depth selection. Experimental results demonstrate that the proposed algorithm can significantly improve coding speed with negligible coding efficiency losses.Item Open Access Learned Tensor Low-CP-Rank and Bloch response manifold priors for Non-Cartesian MRF Reconstruction(ISMRM & ISMRT, 2023-02-13) Li, Peng; Li, Xiaodi; Lu, Xin; Hu, YueWe propose a deep unrolled network for non-Cartesian MRF reconstruction by unrolling the MRF reconstruction model regularized by the tensor low-rank and the Bloch resonance manifold priors. To avoid computationally burdensome singular value decomposition, we propose a learned CP decomposition module to exploit the tensor low-rank priors of MRF data. Inspired by the MRF imaging mechanism, we also propose a Bloch response manifold module to learn the mapping between reconstructed MRF data and the multiple parameter maps. Numerical experiments show that the proposed network can improve the reconstruction quality of MRF data and multi-parameter maps within significantly reduced computational time.Item Metadata only Learning-Based Fast Splitting and Directional Mode Decision for VVC Intra Prediction(IEEE, 2024-02-19) Huang, Yuanyuan; Yu, Junyi; Wang, Dayong; Lu, Xin; Dufaux, Frederic; Guo, Hui; Zhu, CeAs the latest video coding standard, Versatile Video Coding (VVC) is highly efficient at the cost of very high coding complexity, which seriously hinders its practical application. Therefore, it is very crucial to improve its coding speed. In this paper, we propose a learning-based fast split mode (SM) and directional mode (DM) decision algorithm for VVC intra prediction using a deep learning approach. Specifically, given the observation that the SM distributions of coding units (CUs) of different sizes are significantly distinct, we first design the neural networks separately and train the SM models for all CUs of different sizes to obtain the probability of SMs and skip the unlikely ones. Second, given a similar observation that the DM distributions of CUs of different sizes are distinct, we design neural networks to train the DM models for all CUs of different sizes separately to obtain the probabilities of DMs, and then adaptively select candidate DMs based on probabilities of their located SMs. Third, after an SM is checked, we select its probability, residual coefficients, rate-distortion (RD) cost, etc. as features, and design a lightweight neural network (LNN) model to early terminate SM selection. Experimental results demonstrate that the proposed algorithm can reduce the encoding time of VVC by 70.73% with 2.44% increase in Bjøntegaard delta bitrate (BDBR) on average.Item Open Access A novel mode selection-based fast intra prediction algorithm for spatial SHVC(IEEE, 2023-06) Wang, Dayong; Sun, Yu; Li, Weisheng; Xie, Lele; Lu, Xin; Dufaux, Frederic; Zhu, CeDue to multi-layer encoding and Inter-layer prediction, Spatial Scalable High-Efficiency Video Coding (SSHVC) has extremely high coding complexity. It is very crucial to improve its coding speed so as to promote widespread and cost-effective SSHVC applications. In this paper, we have proposed a novel Mode Selection-Based Fast Intra Prediction algorithm for SSHVC. We reveal the RD costs of Inter-layer Reference (ILR) mode and Intra mode have a significant difference, and the RD costs of these two modes follow Gaussian distribution. Based on this observation, we propose to apply the classic Gaussian Mixture Model and Expectation Maximization in machine learning to determine whether ILR is the best mode so as to skip the Intra mode. Experimental results demonstrate that the proposed algorithm can significantly improve the coding speed with negligible coding efficiency loss.