An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes

Large Multimodal Models (LMMs) uniformly perceive video frames, creating
computational inefficiency for videos with inherently varying temporal
information density. This paper present \textbf{Quicksviewer}, an LMM with new
perceiving paradigm that partitions a video of nonuniform density into varying
cubes using Gumbel Softmax, followed by a unified resampling for each cube to
achieve efficient video understanding. This simple and intuitive approach
dynamically compress video online based on its temporal density, significantly
reducing spatiotemporal redundancy (overall 45$\times$ compression rate), while
enabling efficient training with large receptive field. We train the model from
a language backbone through three progressive stages, each incorporating
lengthy videos on average of 420s/1fps thanks to the perceiving efficiency.
With only 0.8M total video-text samples for training, our model outperforms the
direct baseline employing a fixed partitioning strategy by a maximum of 8.72 in
accuracy, demonstrating the effectiveness in performance. On Video-MME,
Quicksviewer achieves SOTA under modest sequence lengths using just up to 5\%
of tokens per frame required by baselines. With this paradigm, scaling up the
number of input frames reveals a clear power law of the model capabilities. It
is also empirically verified that the segments generated by the cubing network
can help for analyzing continuous events in videos.

Este artículo explora los viajes en el tiempo y sus implicaciones.

Descargar PDF:

2504.15270v1

An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes

Plataforma Online

Enlaces

Verbalus Mater

An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes

An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes

Plataforma Online

Enlaces

Verbalus Mater

Signo en

Regístrate

— PRÓXIMO CURSO ONLINE EMPIEZA EL 15 DE ENERO —

La Ciencia Real Detrás de los Viajes Temporales 25% DTO

La Ciencia Real Detrás de
los Viajes Temporales
25% DTO