Exploring Backbone Network Choices in FCOS3D: Performance and Efficiency Analysis
DOI:
https://doi.org/10.61173/15b9t728Keywords:
3D object detection, FCOS3D, FCOS-Swin, FCOS-ConvNeXt, nuScenes dataset, performance com-parisonAbstract
This study presents a comparative analysis of the performance of two modified object detection models, FCOS-Swin and FCOS-ConvNeXt, against the FCOS3D baseline model using the nuScenes dataset. The study evaluates the models based on their classification results for various categories of objects and on multiple evaluation metrics. We compare FCOS-Swin and FCOS-ConvNeXt, which utilize different backbone architectures, to evaluate their effectiveness in 3D object detection. Results show that the modified models exhibit comparable performance with slight variations in all metrics compared to the baseline, but fall short of the fine-tuned FCOS3D model. Potential reasons for this performance gap, including model parameter size, data augmentation methods, learning rate settings, and training epochs, are discussed. This study also explores possible improvements and future work, such as switching to larger backbone models, utilizing stronger data augmentation techniques, adjusting the learning rate method, increasing training epochs, and incorporating temporal and spatial logic to optimize model performance.