Paolo Rota

Associate Professor @ CIMeC/DISI - University of Trento

Palazzo Fedrigotti - Room 216 - Rovereto
Polo Ferrari - Room 110 - Povo 2

If you are interested in working or visiting the MHUG group, please fill this form .

About me

I am Paolo Rota, associate professor at the University of Trento and ELLIS member, working in computer vision, machine learning, and multimodal AI. My research focuses on vision-language models and activity recognition, with applications in video analytics and industrial AI.

My research focuses on computer vision and machine learning, with a particular emphasis on video understanding and multimodal AI. I investigate methods that bridge vision and language, addressing challenges such as zero-shot action recognition, temporal action localization, and open-world recognition using large vision–language models. More broadly, I work on representation learning and domain adaptation to build systems capable of robust visual understanding in dynamic and real-world environments. I am also interested in evaluation protocols and benchmark design for video analytics, aiming to develop principled and reliable ways to assess multimodal AI systems.

Outside of academia, I co-founded Mountain Maps, a startup that uses AI to enhance outdoor navigation and help people explore mountain environments more safely and enjoyably.

News

I’m pleased to share that I will serve as Web Co-Chair for ACM MM 2026, which will be held in Rio de Janeiro in October. Please feel free to report anything on the website that does not appear or function as expected! :S

Current Students

Benedetta Liberatori

Co-advised with Elisa Ricci

Email Website

Vision and Language Video Understanding Video Differencing Anomaly Detection

Jiaqi Liu

Co-advised with Nicu Sebe

Email Website

Image Generation Pose Transfer Video Generation

Yan Shu

Co-advised with Nicu Sebe

Email Website

Remote Sensing Vision and Language Temporal Modelling Video Understanding

Shiyao Xu

Co-advised with Gül Varol

Email Website

3D Human Motion Understanding Vision and Language

Ester Riccardi

Co-advised with Roberto Bottini

Biological-inspired Media Generation Brain decoding

Former Students

Alessandro Conti

Giacomo Zara

Recent Publications

Zero-Shot Temporal Action Localization Through Textual Guidance

Benedetta Liberatori, Alessandro Conti, Lorenzo Vaquero, Paolo Rota, Yiming Wang, Elisa Ricci. Face and Gestures 2026.

pdf project page

TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation

Yan Shu, Bin Ren, Zhitong Xiong, Xiao Xiang Zhu, Begüm Demir, Nicu Sebe, Paolo Rota. CVPR 2026.

pdf code project page

Rank-based Geographical Regularization: Revisiting Contrastive Self-Supervised Learning for Multispectral Remote Sensing Imagery

Tom Burgert, Leonard Hackel, Paolo Rota, Begüm Demir. WACV 2026.

pdf

Vocabulary-free Image Classification and Semantic Segmentation

Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota, Yiming Wang, Elisa Ricci. T-PAMI 2026.

pdf

Dense Motion Captioning

Shiyao Xu, Benedetta Liberatori, Gül Varol, Paolo Rota. 3DV 2026.

pdf code project page

ConViS-Bench: Estimating Video Similarity Through Semantic Concepts

Benedetta Liberatori, Alessandro Conti, Lorenzo Vaquero, Yiming Wang, Elisa Ricci, Paolo Rota. NeurIPS 2025.

pdf code project page

ImageNet-trained CNNs are not biased towards texture: Revisiting feature reliance through controlled suppression

Tom Burgert, Oliver Stoll, Paolo Rota, Begüm Demir. NeurIPS (Oral) 2025.

pdf code

On Large Multimodal Models as Open-World Image Classifiers

Alessandro Conti, Massimiliano Mancini, Enrico Fini, Yiming Wang, Paolo Rota, Elisa Ricci. ICCV 2025.

pdf code

Automatic benchmarking of large multimodal models via iterative experiment programming

Alessandro Conti, Enrico Fini, Paolo Rota, Yiming Wang, Massimiliano Mancini, Elisa Ricci. ICIAP 2025.

pdf code project page

Multi-focal Conditioned Latent Diffusion for Person Image Synthesis

Jiaqi Liu, Jichao Zhang, Paolo Rota, Nicu Sebe. CVPR 2025.

pdf code project page

Simplifying Open-Set Video Domain Adaptation with Contrastive Learning

Giacomo Zara, Victor Guilherme Turrisi da Costa, Subhankar Roy, Paolo Rota, Elisa Ricci. CVIU 2024.

pdf

Text-Enhanced Zero-Shot Action Recognition: A Training-Free Approach

Massimo Bosetti, Shibingfeng Zhang, Benedetta Liberatori, Giacomo Zara, Elisa Ricci, Paolo Rota. ICPR 2024.

pdf code

Test-time zero-shot temporal action localization

Benedetta Liberatori, Alessandro Conti, Paolo Rota, Yiming Wang, Elisa Ricci. CVPR 2024.

pdf code

AutoLabel: CLIP-based framework for open-set video domain adaptation

Giacomo Zara, Subhankar Roy, Paolo Rota, Elisa Ricci. CVPR 2023.

pdf code

The unreasonable effectiveness of large language-vision models for source-free video domain adaptation

Giacomo Zara, Alessandro Conti, Subhankar Roy, Stéphane Lathuilière, Paolo Rota, Elisa Ricci. ICCV 2023.

pdf code

Vocabulary-free image classification

Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota, Yiming Wang, Elisa Ricci. NeurIPS 2023.

pdf code