Jerry Yao-Chieh Hu
Department of Computer Science
Northwestern University
Department of Computer Science
Northwestern University
jhu \at\ u.northwestern.edu
I am a PhD candidate in Computer Science at Northwestern University, advised by Han Liu in MAGICS lab. I hold my B.S. degree in Physics from National Taiwan University, advised by Pisin Chen.
My research focuses on theoretical foundations and principled methodologies for large Foundation Models (e.g. Large Language Models and Generative AI). My long-term goal is to leverage machine learning to tackle important scientific and societal challenges.
Recently, I study modern foundation models through 4 pillars: Learning, Storing, Computing, and Universality. To make progress, I pursue through neuroscience, statistics, information, and computation.
Learning – Capacity, Stability, and Efficient Adaptation.
I connect attention and modern Hopfield networks, viewing Transformers as high‑capacity associative memories. Under this unified perspective, I study stability and adaptation via outlier filtering and intrinsic low‑rank structure, clarifying when learning is stable, efficient, and where LoRA meets limits.
Storing — Plug‑in Associative Memory for Retrieval & Editing.
I develop memory modules that write at inference, retrieve from external knowledge, and support modular edits. They enjoy provably guarantees (convergence, accuracy, local editability), support streaming inserts and fast queries, and augment a model’s parametric knowledge with reliable nonparametric storage.
Computing — Prompt‑Programmable Inference.
I study how In-Context Learning (ICL) is possible by proving that Transformers can execute learning algorithms internally and a broad class of algorithms via prompting. I also analyze ICL and prompt tuning as soft external memory, and characterize universality and efficiency governed by prompt length and structure.
Universality – Expressivity, Algorithmic Emulation, and Learnability Limits.
I characterize the universality of minimalist Transformers and prompt‑based algorithm emulation, and study learnability limits. I also extend these results to modern generative AI (DiT, flow matching), and derive sharp rates and near‑linear training criteria with no expressive loss under softmax attention.
Through these pillars, I aim to build a unified scientific foundation to enable robust, efficient, and interpretable AI systems.
Beyond ML theory and method, I also enjoy interdisciplinary research collaborations: Particle Physics at Fermilab, Drug Design at Abbvie, Finance at Gamma Paradigm Capital, and NdLinear & NdLinear-LoRA at Ensemble AI.
I dedicate 2 hours weekly for master, undergrad and high school outreach students to chat about Research, Grad School, and My Transition to ML research from a Non-CS/ML Background
I welcome students from underrepresented groups and will prioritize these meetings
Please fill out this link to schedule a chat :)