NODE ACTIVE · CASIA · BEIJING
Multimodal AI
PhD Candidate · Institute of Automation · Chinese Academy of Sciences

Dizhan Xue

State Key Laboratory of Multimodal Artificial Intelligence Systems

Researcher at CASIA, advised by Prof. Changsheng Xu.
Building toward AGI through multimodal reasoning — with a current focus on LLM Agents and their capacity for open-world intelligence.

20+Publications
9First-author Papers
4Top Conferences
5Top Journals
About
01 / 08

I'm a PhD candidate at State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences (CASIA), advised by Prof. Changsheng Xu.

My core research focuses on enabling AI to understand and reason across complex multimodal information. I believe that multimodal intelligence is the essential pathway to AGI. Recently, I've been particularly interested in LLM Agents and their capabilities in sophisticated reasoning tasks.

Research Interests
Multimodal Reasoning LLM Agents Explainable AI Vision-Language Models Social Media Analysis
Education
02 / 08
🔬
PhD in Pattern Recognition and Intelligent Systems
Institute of Automation, Chinese Academy of Sciences (CASIA)
📅 Sep 2021 - Jun 2026
👨‍🏫 Advisor: Prof. Changsheng Xu
📝 Thesis: Reliable Multimodal Reasoning in Complicated Scenarios
🎓
Bachelor of Computer Science and Technology
University of Chinese Academy of Sciences (UCAS) (#3 in China by national avg. admission score)
📅 Sep 2017 - Jun 2021
👨‍🏫 Advisor: Prof. Changsheng Xu
📝 Thesis: Debiased Short Video Recommendation Based on Counterfactual Reasoning
Research Experience
03 / 08
Research Experience Timeline
First-author Papers
04 / 08
Integrating Neural-Symbolic Reasoning With Variational Causal Inference Network for Explanatory Visual Question Answering
Dizhan Xue, Shengsheng Qian, Changsheng Xu
Variational Causal Inference Network for Explanatory Visual Question Answering
Dizhan Xue, Shengsheng Qian, Changsheng Xu
SoMe: A Realistic Benchmark for LLM-based Social Media Agents
Dizhan Xue, Jing Cui, Shengsheng Qian, Chuanrui Hu, Changsheng Xu
Few-Shot Multimodal Explanation for Visual Question Answering
Dizhan Xue, Shengsheng Qian, Changsheng Xu
MMT: Image-guided Story Ending Generation with Multimodal Memory Transformer
Dizhan Xue, Shengsheng Qian, Quan Fang, Changsheng Xu
LININ: Logic Integrated Neural Inference Network for Explanatory Visual Question Answering
Dizhan Xue, Shengsheng Qian, Quan Fang, Changsheng Xu
Vision-Controllable Language Model for Image-guided Story Ending Generation
Dizhan Xue, Shengsheng Qian, Changsheng Xu
Short-video Propagation Influence Rating: A New Real-world Dataset and A New Large Graph Model
Dizhan Xue, Shengsheng Qian, Chuanrui Hu, Changsheng Xu
A Unified Framework for Backdoor Trigger Segmentation
Dizhan Xue, Shengsheng Qian, Xueshan Deng, Changsheng Xu
Co-authored Papers
05 / 08
Integrating Multi-Label Contrastive Learning with Dual Adversarial Graph Neural Networks for Cross-Modal Retrieval
Shengsheng Qian, Dizhan Xue, Quan Fang, Changsheng Xu
LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval
Zhenyu Yang, Dizhan Xue, Shengsheng Qian, Weiming Dong, Changsheng Xu
BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents
Yifei Wang, Dizhan Xue, Shengjie Zhang, Shengsheng Qian
Dual Adversarial Graph Neural Networks for Multi-Label Cross-Modal Retrieval
Shengsheng Qian, Dizhan Xue, Quan Fang, Changsheng Xu
Nonparametric Clustering-Guided Cross-View Contrastive Learning for Partially View-Aligned Representation Learning
Shengsheng Qian, Dizhan Xue, Jun Hu, Huaiwen Zhang, Changsheng Xu
Adaptive Label-Aware Graph Convolutional Networks for Cross-Modal Retrieval
Shengsheng Qian, Dizhan Xue, Quan Fang, Changsheng Xu
Code-Driven LLM Agent for One-Shot Explanatory Visual Question Answering
Zuyi Zhou, Dizhan Xue, Baoyuan Qi, Shengsheng Qian, Changsheng Xu
Open-World Social Event Classification
Shengsheng Qian, Hong Chen, Dizhan Xue, Quan Fang, Changsheng Xu
Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval
Zhenyu Yang, Shengsheng Qian, Dizhan Xue, Jiahong Wu, Fan Yang, Weiming Dong, Changsheng Xu
Learning Temporal Event Knowledge for Continual Social Event Classification
Shengsheng Qian, Shengjie Zhang, Dizhan Xue, Huaiwen Zhang, Changsheng Xu
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding
Zhenyu Yang, Yuhang Hu, Zemin Du, Dizhan Xue, Shengsheng Qian, Jiahong Wu, Fan Yang, Weiming Dong, Changsheng Xu
Preprints
06 / 08
A Survey on Interpretable Cross-modal Reasoning
CoRR abs/2309.01955 (2023)
Selected Honors
07 / 08
Best Paper Honorable Mention
ACM SIGIR 2024 — LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval. (Second author)
First-Class Scholarship
UCAS 2018 — First-Class Academic Scholarship of the University of Chinese Academy of Sciences. Awarded to the top 5% of students.
Open Source
08 / 08
📧
MailMind
Email Agent
LLM-powered multi-step email assistant for intelligent email management.
View on GitHub →
🔗
GNN4CMR
Cross-modal Retrieval
Graph Neural Network and Toolkits for Cross-modal Retrieval tasks.
View on GitHub →
Contact
📧
xuedizhan17@mails.ucas.ac.cn
🌐
dblp.org/pid/293/9621
💻
github.com/LivXue