Hui Ding

I am a Research Scientist at Meta Superintelligence Labs. My work focuses on building unified multimodal foundation models that bridge understanding, generation, and editing across text, image, video, and audio.

Before that I worked at Adobe Firefly, Amazon AGI Foundations and AWS AI Labs where I was part of the team that launched Titan Image Generator (Nova Canvas). I received my PhD at University of Maryland, College Park in 2020, advised by Professor Rama Chellappa. I interned at Waymo, Adobe Research, Palo Alto Research Center, and Siemens Healthineers during my PhD.

Google Scholar / LinkedIn / Email

Research

	UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics Xi Chen, Zhifei Zhang, He Zhang, Yuqian Zhou, Soo Ye Kim, Qing Liu, Yijun Li, Jianming Zhang, Nanxuan Zhao, Yilin Wang, Hui Ding, Zhe Lin, Hengshuang Zhao CVPR Highlight, 2025 Project Page / arXiv As a universal framework, UniReal supports a broad spectrum of image generation and editing tasks within a single model, accommodating diverse input-output configurations and generating highly realistic results, which effectively handle challenging scenarios, e.g., shadows, reflections, lighting effects, object pose changes, etc.
	PolyFormer: Referring Image Segmentation as Sequential Polygon Generation Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha (equal contribution) CVPR*, 2023 Project Page / arXiv / code Instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation masks. This is enabled by a new sequence-to-sequence framework, Polygon Transformer (PolyFormer), which takes a sequence of image patches and text query tokens as input, and outputs a sequence of polygon vertices autoregressively.
	Learning Self-Consistency for Deepfake Detection Tianchen Zhao, Xiang Xu, Mingze Xu, Hui Ding, Yuanjun Xiong, and Wei Xia ICCV Oral, 2021 arXiv We introduce a novel representation learning ap- proach, called pair-wise self-consistency learning (PCL), for training ConvNets to extract these source features and detect deepfake images.
	Occlusion-Adaptive Deep Network for Robust Facial Expression Recognition Hui Ding, Peng Zhou and Rama Chellappa International Joint Conference on Biometrics (IJCB) Oral, 2020 arXiv We propose a landmark-guided attention branch to find and discard corrupted features from occluded regions so that they are not used for recognition. To further improve robustness, we propose a facial region branch to partition the feature maps into non-overlapping facial blocks and task each block to predict the expression independently.
	ExprGAN: Facial Expression Editing with Controllable Expression Intensity Hui Ding, Kumar Sricharan and Rama Chellappa AAAI Oral, 2018 arXiv / code We propose an Expression Generative Adversarial Network (ExprGAN) for photo-realistic facial expression editing with controllable expression intensity. An expression controller module is specially designed to learn an expressive and compact expression code in addition to the encoder-decoder network. This novel architecture enables the expression intensity to be continuously adjusted from low to high.
	A Deep Cascade Network for Unaligned Face Attribute Classification Hui Ding, Hao Zhou, Shaohua Kevin Zhou and Rama Chellappa AAAI Spotlight, 2018 arXiv We propose a cascade network that simultaneously learns to localize face regions specific to attributes and performs attribute classification without alignment. First, a weakly-supervised face region localization net- work is designed to automatically detect regions (or parts) specific to attributes. Then multiple part-based networks and a whole-image-based network are separately constructed and combined together by the region switch layer and attribute re- lation layer for final attribute classification.
	Facenet2expnet: Regularizing a deep face recognition net for expression recognition Hui Ding, Shaohua Kevin Zhou and Rama Chellappa IEEE International Conference on Automatic Face Gesture Recognition (FG), 2017 arXiv Relatively small data sets available for expression recognition research make the training of deep networks for expression recognition very challenging. We present FaceNet2ExpNet, a novel idea to train an expression recognition network based on static images. We first propose a new distribution function to model the high-level neurons of the expression network. Based on this, a two-stage training algorithm is carefully designed.