Chaohui Yu - Homepage

superhuiych [at] gmail.com
Google Scholar

About Me

I'm an algorithm engineer at DAMO Academy, Alibaba Group. Before this, I got my Master degree and Bachelor degree from Institute of Computing Technology (ICT) and Shandong University in 2020 and 2017, respectively. My research interest includes: Transfer Learning, Object Detection/Segmentation, Semi/Self-supervised Learning, Multimodal Learning, image/video/3D/4D Generation, and related applications.

News

[Nov. 2025] Two paper accepted to AAAI 2026!

[Sept. 2025] One paper accepted to NeurIPS 2025!

[Aug. 2025] Two paper accepted to Siggraph Asia 2025!

[Jul. 2025] One paper accepted to ACMMM 2025!

[Jun. 2025] Three paper accepted to ICCV 2025!

[Feb. 2025] One paper accepted to CVPR 2025!

Academic Service

Conference Reviewer: CVPR, ICCV, ECCV, BMVC, Siggraph, Siggraph Asia, NeurIPS, ACM MM, AAAI

Journal Reviewer: TCSVT, JCST

Education & Experiences

DAMO Academy, Alibaba Group

Beijing, China

July 2020 – Current

Algorithm Expert at DAMO Academy

Institute of Computing Technology (ICT), Chinese Academy of Sciences

Beijing, China.

Sep. 2017 – Jun. 2020

M.S. in Computer Science.

DAMO Academy, Alibaba Group

Beijing, China

Jun. 2019 – Sep. 2019

Algorithm intern at Mind.

Face++

Beijing, China

Mar. 2018 – Dec. 2018

Algorithm intern at Detection Group.

DeepGlint

Beijing, China

Oct. 2016 – Feb. 2017

Algorithm intern at Algorithm Group.

Intel

Beijing, China

July 2016 – Oct. 2016

Development intern at Linux Kernel Dev and Test Group.

Shandong University

Shandong, China

Sep. 2013 – Jun. 2017

B.E. in Communication Engineering

Awards & Scholarships

First place of LUAI Challenge on Learning to Understand Aerial Images, ICCV Workshop 2021. [

] [

] [LINK]

National Scholarship for Master students, Ministry of Education 2019.

Best Application Paper Award. IJCAI-19 Federated Machine Learning Workshop 2019. [LINK]

Talks

China3DV 2025: 面向3D/4D生成的探索及应用. [LINK]

Recent Publications

CamPVG: Camera-Controlled Panoramic Video Generation with Epipolar-Aware Diffusion

Chenhao Ji, Chaohui Yu, Junyao Gao, Fan Wang, Cairong Zhao.

Siggraph Asia 2025

[PDF]

Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation

Chenjie Cao, Jingkai Zhou, Shikai Li, Jingyun Liang, Chaohui Yu^*, Fan Wang, Yanwei Fu, Xiangyang Xue.

Siggraph Asia 2025

[PDF] [Project Page]

WorldVLA: Towards Autoregressive Action World Model

Jun Cen, Chaohui Yu, et al.

Preprint

[PDF] [Project Page]

3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models

Min Wei, Chaohui Yu^*, Jingkai Zhou, Fan Wang.

The ACM International Conference on Multimedia. (ACMMM-25)

[PDF] [Project Page]

AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation

Zijie Wu, Chaohui Yu, Fan Wang, Xiang Bai.

The International Conference on Computer Vision. (ICCV-25)

[PDF] [Project Page] [CODE]

LiON-LoRA: Rethinking LoRA Fusion to Unify Controllable Spatial and Temporal Generation for Video Diffusion

Yisu Zhang, Chenjie Cao, Chaohui Yu, Jianke Zhu.

The International Conference on Computer Vision. (ICCV-25)

[PDF] [Project Page]

MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model

Chenjie Cao, Chaohui Yu, Shang Liu, Fan Wang, Xiangyang Xue, Yanwei Fu.

The Conference on Computer Vision and Pattern Recognition (CVPR-25)

[PDF] [Project Page] [CODE]

LPM: Efficient 3D Content Creation from Single Image by Large-Scale Partial 3D Modeling.

Yisu Zhang, Chaohui Yu, Fan Wang, Jianke Zhu.

IEEE Transactions on Circuits and Systems for Video Technology (TCSVT-25)

[PDF]

Animate3D: Animating Any 3D Model with Multi-view Video Diffusion

Yanqin Jiang^*, Chaohui Yu^*, Chenjie Cao, Fan Wang, Weiming Hu, Jin Gao.

Neural Information Processing Systems. (NeurIPS-24)

[PDF] [Project Page]

MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing

Chenjie Cao, Chaohui Yu, Yanwei Fu, Fan Wang, Xiangyang Xue.

Neural Information Processing Systems. (NeurIPS-24)

[PDF] [Project Page]

SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer

Zijie Wu, Chaohui Yu, Yanqin Jiang, Chenjie Cao, Fan Wang, Xiang Bai.

European Conference on Computer Vision. (ECCV-24)

[PDF] [Project Page]

VCD-Texture: Variance Alignment based 3D-2D Co-Denoising for Text-Guided Texturing

Shang Liu, Chaohui Yu, Chenjie Cao, Wen Qian, Fan Wang

European Conference on Computer Vision. (ECCV-24)

[PDF]

MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis

Ziming Zhong, Yanxu Xu, Jing Li, Jiale Xu, Zhengxin Li, Chaohui Yu, Shenghua Gao

European Conference on Computer Vision. (ECCV-24)

[PDF] [CODE]

Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation

Chaohui Yu, Qiang Zhou, Jingliang Li, Zhe Zhang, Zhibin Wang, Fan Wang

The 31th ACM International Conference on Multimedia. (ACMMM-23)

[PDF]

RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension

Qiang Zhou, Chaohui Yu, Shaofeng Zhang, Sitong Wu, Zhibin Wang, Fan Wang

Preprint

[PDF]

Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation

Chaohui Yu, Qiang Zhou, Jingliang Li, Jianlong Yuan, Zhibin Wang, Fan Wang

The Conference on Computer Vision and Pattern Recognition (CVPR-23)

[PDF]

LMSeg: Language-guided Multi-dataset Segmentation

Qiang Zhou, Yuang Liu, Chaohui Yu, Jingliang Li, Zhibin Wang, Fan Wang

The International Conference on Learning Representations 2023. (ICLR-23)

[PDF]

MimCo: Masked Image Modeling Pre-training with Contrastive Teacher

Qiang Zhou, Chaohui Yu, Hao Luo, Zhibin Wang, Hao Li

The 30th ACM International Conference on Multimedia. (ACMMM-22)

[PDF]