Jerry Yao-Chieh Hu, 胡耀傑
Department of Computer Science
Northwestern University
jhu \at\ u.northwestern.edu
I am a PhD candidate in Computer Science at Northwestern University, advised by Han Liu in MAGICS lab. I hold my B.S. degree in Physics from National Taiwan University, advised by Pisin Chen.
My research focuses on developing theoretical foundations and principled methodologies for Large Language Models, Foundation Models and Generative AI. My long-term goal is to leverage machine learning to tackle important scientific and societal challenges.
Recently, I have focused on understanding inference and learning in large pretrained models through the dual lens of statistics and neuroscience. This unique (model-based) perspective allows me to explore
Computational and statistical properties of pretrained transformer and diffusion models for pretraining, inference, fine-tuning, compression and alignment
New methodological and algorithmic designs, with theoretical guarantees to ensure their practical optimality
Open Invitation: Individual Support Office Hours
I dedicate 2 hours weekly for master, undergrad and high school outreach students to chat about Research, Grad School, and My Transition to ML research from a Non-CS/ML Background
I welcome students from underrepresented groups and will prioritize these meetings
Please fill out this link to schedule a chat :)
Research
Topics I am currently working on (Machine/Deep Learning):
Rethinking Pretrained Models as Statistical Brains via the Lens of Dense Associative Memory (DenseAM a.k.a. Modern Hopfield Models): Theory, Algorithm and Methodology
Transformer Attentions are Entropy-Regularized Dense Associative Memory Models: Unified theoretical framework for the Transformer-DenseAM correspondence (connecting to a range of transformer attentions) [NeurIPS'24a, ICML'24a, ICML'24b, ICLR'24, NeurIPS'23]
DenseAMs' Memory Retrieval Can Be as Fast as Almost Linear Time: Complete computational theory of all possible transformer-corresponded dense associative memory models [ICML'24a]
Rethining Transformer Representation Learning from a DenseAM Perspective:
Geometric and analytical analysis (and new algorithm) of the training, representation learning, and memorization of transformers [ICML'24b]
Provably Optimal Memory Capacity for Modern Hopfield Models [NeurIPS'24a]
Robust Model Quantization with Outlier-Free DenseAM: Theoretical justifications for Softmax_1 as a quantization-strong and resource-efficient backbone model for LLMs and large foundation models [ICML'24c]
Nonparametric framework for transformer-corresponded auto- and hetero-associative memory models [arXiv]
Complete computational theory of LoRA through inherently low-rank structure of LoRA gradients [arXiv]
Unified view of the jailbreak phenomena (Safety) of LLMs and a new jailbreak method from a Bayesian (and nonparametric) probabilistic perspective [arXiv, arXiv]
Statistical and computational guarantees of Diffusion Transformers (DiTs)
[NeurIPS'24b]Improving previous best Differentially Private Query Algorithm for foundation models (both privacy-utility tradeoff and efficiency) [arXiv]
AI/ML for Science and Finance
Programmable feature engineering for Sequential Data [ICML'23]
Fast (even Trainless) test-time adaptation for Time Series [ICLR'24]
Sparsity-aware, multi-resolution, bi-directional Tabular Learning [ICML'24d]
Real-time Edge AI for Accelerator Controls (Particle Physics, READS Collaboration) @ Fermilab [IBIC'23, ICALEPCS'23, NeurIPS'23, FastML4Sci@ICCAD'23]