VoxAdapt: Adaptive Multi-Scale 3D Object Detection for Real-Time Mobile Applications and Edge Systems

Daham Pathiraja; Indika Perera

doi:10.3991/ijim.v20i08.59947

Authors

Daham Pathiraja University of Moratuwa, Moratuwa, Sri Lanka https://orcid.org/0009-0001-1323-5795
Indika Perera University of Moratuwa, Moratuwa, Sri Lanka https://orcid.org/0000-0001-5660-248X

DOI:

https://doi.org/10.3991/ijim.v20i08.59947

Keywords:

mobile edge computing, real-time 3D perception, intelligent sensing systems, adaptive deep learning, resource-constrained deployment, LiDAR-based object detection, multi-scale representations, mobile robotics, interactive mobile systems

Abstract

Deploying real-time 3D perception capabilities on mobile and edge platforms, such as autonomous robots, drones, and intelligent sensing systems, requires balancing detection accuracy against strict computation and memory constraints. Existing voxel-based LiDAR perception pipelines rely on fixed voxel sizes that must be manually tuned for each dataset and sensor, limiting their adaptability across deployment environments. We introduce VoxAdapt, an adaptive multi-scale 3D object detection framework that treats voxel-scale values as learnable parameters updated via a surrogate gradient pathway, enabling task-driven optimization without requiring differentiation using discrete voxel indexing. Unlike prior approaches that adapt feature aggregation under fixed discretization, VoxAdapt allows voxel resolutions to be adjusted during training in response to the detection objectives and resource constraints. Experiments on the KITTI benchmark demonstrated that VoxAdapt enables robust detection of small, sparsely sampled objects that challenge fixed-scale methods while maintaining competitive performance on larger objects with minimal computational overhead. These results highlight the potential of learning adaptive geometric representations to support efficient and deployable 3D perception systems for real-time mobile and edge applications.

Author Biographies

Daham Pathiraja, University of Moratuwa, Moratuwa, Sri Lanka

Daham Pathiraja is a researcher at the Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. His research interests include 3D computer vision, deep learning for point cloud processing, adaptive voxelization, real-time 3D object detection, and efficient artificial intelligence models for edge and resource-constrained environments.

Indika Perera, University of Moratuwa, Moratuwa, Sri Lanka

Prof. Indika Perera is a Professor at the Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. His research interests include software engineering, software architecture and design, enterprise software systems, software process and management, and human–computer interaction, with a focus on software engineering processes for AI-based enterprise systems.

References

[1] Y. Zhou, O. Tuzel, "VoxelNet: End-to-end learning for point cloud based 3D object detection," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

[2] Y. Yan, Y. Mao, B. Li, "SECOND: Sparsely embedded convolutional detection," Sensors, vol. 18, no. 10, p. 3337, 2018.

[3] H. Kuang, B. Wang, J. An, M. Zhang, Z. Zhang, "Voxel-FPN: Multi-scale voxel feature aggregation for 3D object detection from LIDAR point clouds," Sensors, vol. 20, no. 3, p. 704, 2020.

[4] M. Ye, S. Xu, T. Cao, "HVNet: Hybrid voxel network for LiDAR based 3D object detection," in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

[5] J. Noh, S. Lee, B. Ham, "HVPR: Hybrid Voxel-Point Representation for SingleStage 3D Object Detection," in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021, 2021.

[6] T. Jiang, N. Song, H. Liu, R. Yin, Y. Gong, J. Yao, "VICNet: Voxelization Information Compensation Network for Point Cloud 3D Object Detection," in IEEE International Conference on Robotics and Automation (ICRA) 2021, Xi’an, China, 2021.

[7] J. Hu, G. Jin, "An Intelligent Framework for English Teaching through Deep Learning and Reinforcement Learning with Interactive Mobile Technology," International Journal of Interactive Mobile Technologies (iJIM), vol. 18, no. 09, p. 74–87, 2024.

[8] J. Mao et al., "Voxel Transformer for 3D Object Detection," in IEEE/CVF International Conference on Computer Vision (ICCV), 2021.

[9] L. Fan et al., "Embracing Single Stride 3D Object Detector with Sparse Transformer," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022.

[10] F. Cao et al., "MCHFormer: A Multi-Cross Hybrid Former of PointImage for 3D Object Detection," IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, p. 383–394, 2024.

[11] A.Diego et al., "Mobile Application for Continuous Recognition and Classification of Sign Language Images through Deep Learning," International Journal of Interactive Mobile Technologies (iJIM), vol. 19, no. 07, p. 4–21, 2025.

[12] E. Perez, F. Strub, H. De Vries, V. Dumoulin, and A. Courville, "FiLM: Visual reasoning with a general conditioning layer," in AAAI Conference on Artificial Intelligence, 2018.

[13] J. Praveenchandar, S. Vinoth Kumar, A. Christopher Paul, M. A. Mukunthan (Manapakkam Anandan), K. Maharajan, "Deep Learning Algorithms in Mobile Edge with Real-Time Abnormal Event Detection for 5G-IoT Devices," International Journal of Interactive Mobile Technologies (iJIM), vol. 17, no. 17, p. 59–71, 2023.

[14] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, O. Beijbom, "PointPillars: Fast Encoders for Object Detection from Point Clouds," in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[15] R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas, "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation," in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

[16] A. Geiger, P. Lenz, and R. Urtasun, "PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space," in Advances in Neural Information Processing Systems (NeurIPS), 2017.

[17] S. Shi, X. Wang, H. Li, "PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud," in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[18] Z. Yang, Y. Sun, S. Liu, J. Jia, "3DSSD: Point-Based 3D Single Stage Object Detector," in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

[19] C. Chen, Z. Chen, J. Zhang, D. Tao, "SASA: Semantics-Augmented Set Abstraction for Point-Based 3D Object Detection," in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), February 2022.

[20] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, "Multi-View 3D Object Detection Network for Autonomous Driving," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), July 21–26, 2017.

[21] S. Liu, W. Huang, Y. Cao, D. Li, and S. Chen, "SMS-Net: Sparse Multi-Scale Voxel Feature Aggregation Network for LiDAR-Based 3D Object Detection," Neurocomputing, vol. 501, no. 555-565, 2022.

[22] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, "Feature Pyramid Networks for Object Detection," in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

[23] Q. Xie et al., "MLCVNet: Multi-Level Context VoteNet for 3D Object Detection," in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020.

[24] H. Wang, et al., "DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets," in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 17–24, 2023.

[25] Koltun, F. Yu and V., "Multi-scale Context Aggregation by Dilated Convolutions," in International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, May 2–4, 2016.

[26] Y. Li, Y. Chen, N. Wang, and Z.-X. Zhang, "Scale-Aware Trident Networks for Object Detection," in IEEE/CVF International Conference on Computer Vision (ICCV), October 27–November 2, 2019.

[27] F. Shuang, H. Huang, Y. Li, R. Qu, and P. Li, "AFE-RCNN: Adaptive Feature Enhancement RCNN for 3D Object Detection," Remote Sensing, vol. 14, no. 5, p. 1176, 2022.

[28] P. L. R. U. A. Geiger, "Are we ready for autonomous driving? The KITTI vision benchmark suite," in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, June 16–21, 2012.

[29] OpenMMLab, "MMDetection3D: OpenMMLab next-generation platform for general 3D object detection," 2020. [Online]. Available: https://github.com/open-mmlab/mmdetection3d.