Stereo disparity estimation is crucial for obtaining depth information in
robot-assisted minimally invasive surgery (RAMIS). While current deep learning
methods have made significant advancements, challenges remain in achieving an
optimal balance between accuracy, robustness, and inference speed. To address
these challenges, we propose the StereoMamba architecture, which is
specifically designed for stereo disparity estimation in RAMIS. Our approach is
based on a novel Feature Extraction Mamba (FE-Mamba) module, which enhances
long-range spatial dependencies both within and across stereo images. To
effectively integrate multi-scale features from FE-Mamba, we then introduce a
novel Multidimensional Feature Fusion (MFF) module. Experiments against the
state-of-the-art on the ex-vivo SCARED benchmark demonstrate that StereoMamba
achieves superior performance on EPE of 2.64 px and depth MAE of 2.55 mm, the
second-best performance on Bad2 of 41.49% and Bad3 of 26.99%, while maintaining
an inference speed of 21.28 FPS for a pair of high-resolution images
(1280*1024), striking the optimum balance between accuracy, robustness, and
efficiency. Furthermore, by comparing synthesized right images, generated from
warping left images using the generated disparity maps, with the actual right
image, StereoMamba achieves the best average SSIM (0.8970) and PSNR (16.0761),
exhibiting strong zero-shot generalization on the in-vivo RIS2017 and StereoMIS
datasets.
Este artículo explora los viajes en el tiempo y sus implicaciones.
Descargar PDF:



