The pipeline of the proposed Motion Cue Fusion Network (MCFNet) (IMAGE)
Caption
The pipeline of the proposed Motion Cue Fusion Network (MCFNet) is as follows. Voxels generated from raw event streams are first processed by the Event Correction Module (ECM) to produce high-quality event frames. These frames, along with RGB images, are then fed into two separate CSPDarkNets for modality-specific feature extraction. The Event Dynamic Upsampling Module (EDUM) takes the features from the stage-3 layer to align the spatial resolutions of both modalities. This is followed by the Cross-modal Mamba Fusion Module (CMM) to perform cross-modal fusion. FPN combined with PANet further integrates multi-scale features, and finally, the decoder outputs category and bounding box predictions for each detected object.
Credit
Communications in Transportation Research
Usage Restrictions
News organizations may use or redistribute this image, with proper attribution, as part of news coverage of this paper only.
License
Original content