Deep learning, Sematic segmentation, Computer vision, U-Net
Abstract
This study investigates the performance of three computer vision neural networks architecture, which are the standard U-Net, Deep Residual U-Net(ResU-Net), and VGG19 Integrated U-Net (VGG19U-Net) on car view segmentation. The models are trained with 4000 images and their masks and are tested at different stages of training. The validation criterion includes training and validation loss, Intersection of unions and Dice coefficient. The results demonstrate that ResU-Net outperforms the other models in segmentation accuracy while maintaining competitive prediction speeds. The VGG19U-Net shows improved performance over the standard U-Net, highlighting the benefits of deeper architectures in semantic segmentation tasks. Additionally, the research underlines the importance of architectural modifications like residual connections and deeper convolutional layers for enhancing segmentation accuracy. This study offers valuable insights into optimizing U-Net variants for vehicle segmentation, which can be extended to other real-world applications, including autonomous driving. These findings provide a view for future improvement in real-time image segmentation for complex environments.