Patrick Leask

Postgraduate Student

Affiliations
Affiliation
Postgraduate Student in the Department of Computer Science

Conference Paper

Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models

Leask, P., & Al Moubayed, N. (in press). Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models. In Proceedings of Machine Learning Research.
Sparse Autoencoders Do Not Find Canonical Units of Analysis

Leask, P., Bussmann, B., Pearce, M., Bloom, J., Tigges, C., Al Moubayed, N., Sharkey, L., & Nanda, N. (2025, April 25). Sparse Autoencoders Do Not Find Canonical Units of Analysis. Presented at ICLR2025: The Thirteenth International Conference on Learning Representations, Singapore.
Sparse Autoencoders Do Not Find Canonical Units of Analysis

Leask, P., Bussmann, B., Pearce, M. T., Isaac Bloom, J., Tigges, C., Al Moubayed, N., Sharkey, L., & Nanda, N. (2025, January 22). Sparse Autoencoders Do Not Find Canonical Units of Analysis. Presented at The Thirteenth International Conference on Learning Representations, Singapore.

Staff profile