MCPNet++: An Interpretable Classifier via Multi-Level Concept Prototypes

*Equal Advising
1National Yang Ming Chiao Tung University, 2Amazon
Teaser

ProtoPFormer generates explanations only from the final layer, typically capturing a single high-level concept from activated patches, whereas MCPNet++ provides multi-level explanations from low- to high-level features and further summarizes these concepts with LLMs for more human-friendly interpretation.

Abstract

Post-hoc and inherently interpretable methods have shown great success in uncovering the inner workings of black-box models, whether by examining them after training or by explicitly designing for interpretability. While these approaches effectively narrow the semantic gap between a model's latent space and human understanding, they typically extract only high-level semantics from the model's final feature map. As a result, they provide a limited perspective on the decision-making process. We argue that explanations lacking insight into both lower- and mid-level semantics cannot be considered fully faithful or genuinely useful. To address this issue, we introduce the Multi-Level Concept Prototypes Classifier (MCPNet), which offers a more holistic interpretation by drawing on information from multiple levels within the model. Rather than relying on predefined concept labels, MCPNet autonomously discovers meaningful concepts from feature maps. To increase versatility, we further propose MCPNet++, which can be seamlessly applied to both CNN and transformer backbones, allowing it to learn meaningful concepts from their respective features. Building on these learned concepts, we also introduce an LLM-based method to bridge the gap between these concepts and human perception. Experimental results show that MCPNet++ provides more comprehensive explanations without sacrificing model performance, with the discovered concepts aligning closely with human understanding.
MCPNet MCPNet++ ProtoPNet ProtoPFormer BotCL Concept Bottleneck Model VCC CRAFT*** TCAV
Explanation Type Inherently Inherently Inherently Inherently Inherently Inherently Post-hoc Post-hoc Post-hoc
Explanation Scale Multi-level Multi-level Single-level Single-level Single-level Single-level Multi-level Single-level Single-level
w/o Concept Labels ✗**
Available for CNN
and Transformers
✓* ✓* ✓* ✓* ✓* ✓*

* Applicable to CNN backbones only (not transformers) in original work.   ** TCAV requires user-defined concept examples.   *** CRAFT is a post-hoc method applied after training.

Methods

Overview

MCPNet++ extracts multi-level concept features from different layers, enforces diverse and class-consistent concepts, and aggregates them for interpretable classification.

Overview

Center Kernel Alignment (CKA) Loss

Leveraging the CKA similarity metric, the CKA loss reduces similarity between concept segments within the same layer, encouraging diverse representations at a shared level of abstraction. Although it does not explicitly control what each segment learns, it serves as a constraint that discourages redundancy. This promotes multi-perspective representations, offering a more comprehensive basis for interpretation.

CKA Loss

Contrastive Class-wise Concept (CCC) Loss

Motivated by the intuition that images from the same class tend to share similar concept compositions, the CCC loss encourages MCP features within the same class to be more similar while separating those from different classes. Implemented with a contrastive learning objective, it helps organize the concept space according to class-level patterns. This improves the consistency of learned representations and provides a stronger basis for classification and interpretation.

CCC Loss

Layer-wise Dropout

We introduce layer-wise dropout to reduce over-reliance on MCP features from any single layer during training. By randomly dropping the features from one layer at a time, the model is encouraged to utilize information from multiple semantic levels rather than depending only on the most discriminative one. This helps the classifier learn more balanced representations across low-, mid-, and high-level concepts.

Concept Captions

To bridge the gap between learned concepts and human understanding, we introduce a captioning workflow using large language models (LLMs). For each concept, we collect high-response images and their corresponding activation patches, which implicitly represent its semantic meaning. These visual cues are then provided to the LLM to generate concise descriptions of the underlying concept. Instead of assigning a single fixed label, the model outputs multiple candidate descriptions, offering a more flexible and human-friendly interpretation of learned concepts.

Concept Captions

Experiments

Main Quantitative Results

MCPNet++ achieves competitive accuracy across both CNN (ResNet50) and transformer (DeiT-B-16) backbones while providing multi-level explanations. Compared to existing interpretable methods, it maintains strong performance across datasets, demonstrating that richer multi-level representations can be learned without degrading classification accuracy.

Performance table

Explanation Samples

MCPNet employs multi-scale concept explanations as the foundation for accurate classification. In particular, the high responses to both Grizzly Bear and buffalo classes in terms of high-level concept would lead to confusion if the classification is based solely on the high-level responses, while such confusion can be resolved with the incorporation of low-level concept responses. Moreover, even without a direct concept match in the image, MCPNet accurately interprets the image using the constructed MCP distribution based on the holistic consideration over concept responses across multiple scales.

Explanation Samples

Relation between Caption and Visualization

We present the concept response difference between the original and edited images to illustrate how visual changes are reflected in the learned concept representations. Hover over each image to highlight its corresponding concept responses.

Original Image
Original Image
Concept Responses
Original Edited
Edited Image
Edited Image

Counterfactual Result

To evaluate counterfactual reasoning, we compare concept responses between an original misclassified image and its counterfactually edited counterpart. Hover over each image to highlight its corresponding concept responses.

Original Image
Original Image
✗ Prediction: Fox
Concept Responses
Original Edited
Turn Fur Grey
Edited Image
Edited Image
✓ Prediction: Wolf

BibTeX

@ARTICLE{wang2026MCPNetPP,
  author  = {Wang, Bor-Shiun and Wang, Chien-Yi and Chiu, Wei-Chen},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
  title   = {MCPNet++: Interpretable Classification Models via Multi-Level Concept Prototypes},
  year    = {2026},
  pages   = {1--18},
  doi     = {10.1109/TPAMI.2026.3680506}
}