Jerry Yao-Chieh Hu, 胡耀傑
Department of Computer Science
Northwestern University
Department of Computer Science
Northwestern University
jhu \at\ u.northwestern.edu
I am a PhD candidate in Computer Science at Northwestern University, advised by Han Liu in MAGICS lab. I hold my B.S. degree in Physics from National Taiwan University, advised by Pisin Chen.
My research focuses on theoretical foundations and principled methodologies for Large Language Models, Foundation Models and Generative AI. My long-term goal is to leverage machine learning to tackle important scientific and societal challenges.
Recently, I have focused on understanding inference and learning in large pretrained models through the dual lens of statistics and neuroscience. This unique (model-based) perspective allows me to explore
Computational and statistical properties of pretrained transformer and diffusion models for pretraining, inference, fine-tuning, compression and alignment
New methodological and algorithmic designs, with theoretical guarantees to ensure their practical optimality
I dedicate 2 hours weekly for master, undergrad and high school outreach students to chat about Research, Grad School, and My Transition to ML research from a Non-CS/ML Background
I welcome students from underrepresented groups and will prioritize these meetings
Please fill out this link to schedule a chat :)
I will be attending ICLR 2025 in Singapore from April 23 to April 28. Let me know if you'd like to catch up!
We have 3 posters:
Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models [ICLR'25a]
Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency [ICLR'25b]
On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality [ICLR'25c]
I will also be attending the ICLR 2025 DeLTa Workshop and ICLR 2025 NFAM Workshop.
I study the statistical and computational foundations of large‑scale pretrained models and their real‑world applications.
Rethinking Pretrained Models as Statistical Brains via the Lens of Dense Associative Memory (DenseAM a.k.a. Modern Hopfield Models).
Entropy‑Regularized DenseAM ⇄ Transformer Attention — unified theory & capacity bounds [NeurIPS'24a, ICML'24a, ICML'24b, ICLR'24, NeurIPS'23]
Nonparametric DenseAM — auto/hetero associative memory with statistical guarantees [arXiv]
DenseAM Computational Limits — almost‑linear‑time lower bounds [ICML'24a]
Larger DenseAM Capacity for Better Transformer Representation Learning — method [NeurIPS'24a] and optimal capacity [ICML'24b]
How far can fine-tuning methods push performance after pretraining?
Computational Limits of LoRA — hardness and fast algorithm via inherent low‑rank gradient structure [ICLR'25a]
Fundamental Limits of Prompt-Tuning — universal approximation & computational limits [ICLR'25b]
Bridging diffusion, transformers and optimal distribution estimations.
Statistical Rates of Diffusion Transformers (DiTs) — Approximation & Minimax Rates [NeurIPS'24b, ICLR'25c]
Making large models smaller, safer and attack‑resistant.
Robust Model Quantization with Outlier-Free DenseAM — “Softmax_N” attention as a quantization-strong and resource-efficient backbone for LLM [ICML'24c]
Differentially Private Query Algorithm — improved privacy-utility tradeoff and efficiency [arXiv]
Turning theory into impact on critical domains.
Programmable Feature Engineering for Time Series [ICML'23]
Fast (Even Trainless) Test-Time Adaptation for Time Series [ICLR'24]
Sparsity-Aware, Multi-Resolution, Bi-Directional Tabular Learning [ICML'24d]
Real-time Edge AI for Accelerator Controls (Particle Physics, READS Collaboration) @ Fermilab [IBIC'23, ICALEPCS'23, ML4Phys Workshop @ NeurIPS'23, FastML4Sci@ICCAD'23]