Modeling Cognition by Pruning and Topography-Learning in Deep Neural Networks

Truong, Le Minh Nhut

Deep artificial neural networks (DNNs) now match or exceed human accuracy on many benchmarks, yet high performance alone does not imply human-like representational structure. This motivates developing algorithms that not only improve benchmarks but also produce representations that are better aligned with human brain or behavior, so that models can be considered as mechanistic accounts for ``in silico’’ experiments in neurosciences. This thesis contributes to the intersection of cognitive neuroscience and AI by studying how biologically inspired algorithms can produce representations that better align with human cognition, and how they shape the internal representations of models beyond task performance. Specifically, we focus on two complementary parts that potentially bring cognitive neuroscience and AI closer together: (i) explaining and improving representational alignment between pretrained models and human cognition, using behavioral similarity judgments as a proxy for mental representational geometry; and (ii) enforcing brain-like correlations in topographic networks, to assess their capabilities of modeling high-level visual cortex, and to understand how these correlations impact the network's performance and representations. Across both parts, the common aim is to test whether models can align with specific aspects of human representational structure, and to characterize how these constraints reshape model representations. In Part I, aiming to improve and explain the alignment between model representations and human semantic knowledge, we focus on representational alignment with human similarity judgments. Rather than relying on all information in the full embeddings extracted from the deep networks, as is common in the literature, we identify the relevant information of the model’s representational space that best match human similarity judgments of a certain semantic category. Technically, we implement this using structured pruning over learned feature maps or units in convolutional neural networks to select a subset that can improve the alignment with human judgments. This part consists of three studies. In Study 1.1, we introduce a statistic quantifying how much each feature map contributes to alignment with human similarity judgments called Alignment Importance Score (AIS), which is not only used for improving alignment, but also for explainable AI analyses. By structured pruning low-AIS feature maps, we improve out-of-sample prediction of human judgments while reducing the number of feature maps. Moreover, AIS-pruning can select feature that produces image-space heatmaps highlighting the visual information most relevant for explaining human comparisons among objects, supporting mechanistic interpretability. In Study 1.2, we investigate whether alignment is driven by a small set of specialized units or by population-level geometry. Using numerosity as a controlled domain, we find that the units critical for capturing similarity judgments, identified via the same pruning method, do not overlap with the units identified via a traditional statistical test (ANOVA). This suggests that human-aligned representation in models is an emergent property of population-level geometry rather than the result of isolated, specialized units. Study 1.3 aims to identified the core representational geometry of an existing model, by extending pruning beyond layer-wise and explicitly supervised targets in previous studies. We introduce Correlation Retaining Iterative Structural Pruning, a geometry-guided procedure that removes redundant feature maps or units, aiming to approximate a target representational geometry. This task-agnostic algorithm formalizes the pruning logic from the earlier studies into a more general framework that can be used to compress models while either retaining a model’s own geometry or aligning geometry to external targets such as similarity judgments. Part II focuses on correlation-based, end-to-end topographic models, where units are arranged on a physical space, then correlated activities are enforced among nearby units via training, thus their activations can be visualized on smooth, brain-like spatial maps. Specifically, we study the capability of capturing cortical organization, and the computational properties of topographic regularizers that encourage correlated representations. This part includes two studies. In Study 2.1, we evaluate whether a current leading state-of-the-art topographic model can capture the fine-grained organization of the human occipitotemporal cortex. Focusing on the action dimension - the degree to which an object is associated with physical manipulation - our results show that while the model successfully captures broad divisions like animacy, it fails to produce an action-related gradient. This finding suggests that generic spatial constraints may be insufficient, and additional requirements are needed to account for the specialized organization of human high-level visual cortex. In the final study, Study 2.2, we investigate the computational advantages of correlated constraints in topographic models and how they shape the network’s internal representations, beyond topographic map visualization - the primary goal in most of previous work. We systematically compare two commonly used local constraints in end-to-end convolutional networks: Activation Similarity, which encourages nearby units to have similar activations, and Weight Similarity, which encourages nearby units to develop similar afferent weight vectors. Our analysis shows that the two constraints can produce robustness not only to input perturbations but also to parameter noise. Moreover, the two constraints produce qualitatively different computational properties at the representation levels. Overall, this thesis investigates cognitively inspired approaches, implemented through structured pruning and topographic constraints, as methods for aligning human-DNN representations, and for shaping DNNs' internal representations. Practically, our studies support an alternative approach to improving human–model alignment: instead of using full embeddings, we improve alignment by selecting the most relevant information within the model. Under this view, AIS and CRISP provide structured pruning tools that improve or preserve alignment-relevant geometry while compressing networks and enabling interpretability, including heatmaps for explaining human similarity comparisons. For topographic modeling, we demonstrate the engineering benefit of locally regularizing correlations, and show that either weight-based or activation-based constraints can be a preferred choice to handle certain types of noise. Theoretically, we show that human-like alignment is better characterized as an emergent property of population-level geometry rather than isolated expert units, and we identify that current topographic models likely require additional constraints to capture the fine-grained organization of the human visual cortex. Collectively, these findings contribute toward more transparent, robust, and cognitively aligned models for both practical applications and in silico cognitive science research.

Modeling Cognition by Pruning and Topography-Learning in Deep Neural Networks

Truong, Le Minh Nhut

2026-04-24

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Attenzione

Citazioni

social impact

Modeling Cognition by Pruning and Topography-Learning in Deep Neural Networks

Truong, Le Minh Nhut

2026-04-24

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Attenzione

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)