Hui Ding

I am a Research Scientist at Meta Superintelligence Labs. My work focuses on building unified multimodal foundation models that bridge understanding, generation, and editing across text, image, video, and audio.

Before that I worked at Adobe Firefly, Amazon AGI Foundations and AWS AI Labs where I was part of the team that launched Titan Image Generator (Nova Canvas). I received my PhD at University of Maryland, College Park in 2020, advised by Professor Rama Chellappa. I interned at Waymo, Adobe Research, Palo Alto Research Center, and Siemens Healthineers during my PhD.

Google Scholar  /  LinkedIn  /  Email

profile photo
Research
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
Xi Chen, Zhifei Zhang, He Zhang, Yuqian Zhou, Soo Ye Kim, Qing Liu, Yijun Li, Jianming Zhang, Nanxuan Zhao, Yilin Wang, Hui Ding, Zhe Lin, Hengshuang Zhao
CVPR Highlight, 2025
Project Page / arXiv

As a universal framework, UniReal supports a broad spectrum of image generation and editing tasks within a single model, accommodating diverse input-output configurations and generating highly realistic results, which effectively handle challenging scenarios, e.g., shadows, reflections, lighting effects, object pose changes, etc.

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Jiang Liu*, Hui Ding*, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha (*equal contribution)
CVPR, 2023
Project Page / arXiv / code

Instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation masks. This is enabled by a new sequence-to-sequence framework, Polygon Transformer (PolyFormer), which takes a sequence of image patches and text query tokens as input, and outputs a sequence of polygon vertices autoregressively.

Learning Self-Consistency for Deepfake Detection
Tianchen Zhao, Xiang Xu, Mingze Xu, Hui Ding, Yuanjun Xiong, and Wei Xia
ICCV Oral, 2021
arXiv

We introduce a novel representation learning ap- proach, called pair-wise self-consistency learning (PCL), for training ConvNets to extract these source features and detect deepfake images.

Occlusion-Adaptive Deep Network for Robust Facial Expression Recognition
Hui Ding, Peng Zhou and Rama Chellappa
International Joint Conference on Biometrics (IJCB) Oral, 2020
arXiv

We propose a landmark-guided attention branch to find and discard corrupted features from occluded regions so that they are not used for recognition. To further improve robustness, we propose a facial region branch to partition the feature maps into non-overlapping facial blocks and task each block to predict the expression independently.

ExprGAN: Facial Expression Editing with Controllable Expression Intensity
Hui Ding, Kumar Sricharan and Rama Chellappa
AAAI Oral, 2018
arXiv / code

We propose an Expression Generative Adversarial Network (ExprGAN) for photo-realistic facial expression editing with controllable expression intensity. An expression controller module is specially designed to learn an expressive and compact expression code in addition to the encoder-decoder network. This novel architecture enables the expression intensity to be continuously adjusted from low to high.

A Deep Cascade Network for Unaligned Face Attribute Classification
Hui Ding, Hao Zhou, Shaohua Kevin Zhou and Rama Chellappa
AAAI Spotlight, 2018
arXiv

We propose a cascade network that simultaneously learns to localize face regions specific to attributes and performs attribute classification without alignment. First, a weakly-supervised face region localization net- work is designed to automatically detect regions (or parts) specific to attributes. Then multiple part-based networks and a whole-image-based network are separately constructed and combined together by the region switch layer and attribute re- lation layer for final attribute classification.

Facenet2expnet: Regularizing a deep face recognition net for expression recognition
Hui Ding, Shaohua Kevin Zhou and Rama Chellappa
IEEE International Conference on Automatic Face Gesture Recognition (FG), 2017
arXiv

Relatively small data sets available for expression recognition research make the training of deep networks for expression recognition very challenging. We present FaceNet2ExpNet, a novel idea to train an expression recognition network based on static images. We first propose a new distribution function to model the high-level neurons of the expression network. Based on this, a two-stage training algorithm is carefully designed.