from SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis
CVPR 2025
from SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models
ArXiv 2024
from TroL: Traversal of Layers for Large Language and Vision Models
EMNLP 2024
from Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
NeurIPS 2024