Bor-Shiun Wang

About

Ph.D. student at National Yang Ming Chiao Tung University

  • Interesting: Explainable AI, Generative AI

I'm a Ph.D. student in Computer Science and Engineering at the Enriched Vision Applications Lab (EVA lab), National Yang Ming Chiao Tung University, advised by Prof. Wei-Chen Chiu and Applied Scientist Chien-Yi Wang.

My research focuses on explainable AI, with an emphasis on developing interpretable models for computer vision and multimodal systems. I am particularly interested in bridging the gap between model transparency and performance in modern deep learning architectures.

More broadly, I aim to advance reliable and interpretable AI systems and explore their applications in real-world scenarios.

Publication

Journal

MCPNet++: An Interpretable Classifier via Multi-Level Concept Prototypes

Bor-Shiun Wang, Chien-Yi Wang*, Wei-Chen Chiu* (*=equal advising)

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2026
Post-hoc and inherently interpretable methods have shown great success in uncovering the inner workings of black-box models, whether by examining them after training or by explicitly designing for interpretability. While these approaches effectively narrow the semantic gap between a model’s latent space and human understanding, they typically extract only high-level semantics from the model’s final feature map. As a result, they provide a limited perspective on the decision-making process. We argue that explanations lacking insight into both lower- and mid-level semantics cannot be considered fully faithful or genuinely useful. To address this issue, we introduce the Multi-Level Concept Prototypes Classifier (MCPNet), which offers a more holistic interpretation by drawing on information from multiple levels within the model. Rather than relying on predefined concept labels, MCPNet autonomously discovers meaningful concepts from feature maps. To increase versatility, we further propose MCPNet++, which can be seamlessly applied to both CNN and transformer backbones, allowing it to learn meaningful concepts from their respective features. Building on these learned concepts, we also introduce a large language model (LLM)-based method to bridge the gap between these concepts and human perception. Experimental results show that MCPNet++ provides more comprehensive explanations without sacrificing model performance, with the discovered concepts aligning closely with human understanding.

Conference

MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes

Bor-Shiun Wang, Chien-Yi Wang*, Wei-Chen Chiu* (*=equal advising)

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Recent advancements in post-hoc and inherently interpretable methods have markedly enhanced the explanations of black box classifier models. These methods operate either through post-analysis or by integrating concept learning during model training. Although being effective in bridging the semantic gap between a model's latent space and human interpretation, these explanation methods only partially reveal the model's decision-making process. The outcome is typically limited to high-level semantics derived from the last feature map. We argue that the explanations lacking insights into the decision processes at low and mid-level features are neither fully faithful nor useful. Addressing this gap, we introduce the Multi-Level Concept Prototypes Classifier (MCPNet), an inherently interpretable model. MCPNet autonomously learns meaningful concept prototypes across multiple feature map levels using Centered Kernel Alignment (CKA) loss and an energy-based weighted PCA mechanism, and it does so without reliance on predefined concept labels. Further, we propose a novel classifier paradigm that learns and aligns multi-level concept prototype distributions for classification purposes via Class-aware Concept Distribution (CCD) loss. Our experiments reveal that our proposed MCPNet while being adaptable to various model architectures, offers comprehensive multi-level explanations while maintaining classification accuracy. Additionally, its concept distribution-based classification approach shows improved generalization capabilities in few-shot classification scenarios.

PRB-FPN+: Video Analytics for Enforcing Motorcycle Helmet Laws

Bor-Shiun Wang*, Ping-Yang Chen*, Yi-Kuan Hsieh, Jun-Wei Hsieh, Ming-Ching Chang, JiaXin He, Shin-You Teng, HaoYuan Yue, Yu-Chee Tseng (*=equal contribution)

IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) on the AI City Challenge, 2023
We present a video analytic system for enforcing motorcycle helmet regulation as a participation to the AI City Challenge 2023 Track 5 contest. The advert of powerful object detectors enables real-time localization of the road users and even the ability to determine if a motorcyclist or a rider is wearing a helmet. Ensuring road safety is important, as the helmets can effectively provide protection against severe injuries and fatalities. However, monitoring and enforcing helmet compliance is challenging, given the large number of motorcyclists and limited visual input such as occlusions. To address these challenges, we propose a novel two-step approach. First, we introduce the PRB-FPN+, a state-of-the-art detector that excels in object localization. We also explore the benefits of deep supervision by incorporating auxiliary heads within the network, leading to enhanced performance of our deep learning architectures. Second, we utilize an advanced tracker named SMILEtrack to associate and refine the target tracklets. Comprehensive experimental results demonstrate that the PRB-FPN+ outperforms the state-of-the-art detectors on MS-COCO. Our system achieved a remarkable rank of 8 on the AI City Challenge 2023 Track 5 Public Leaderboard.

COFENet: Co-Feature Neural Network Model for Fine-Grained Image Classification

Bor-Shiun Wang, Jun-Wei Hsieh, Yi-Kuan Hsieh, Ping-Yang Chen

IEEE International Conference on Image Processing (ICIP), 2022
It is challenging to classify patterns with small inter-class variations but large intra-class variations especially for textured objects with relatively small sizes and blurry boundaries. We propose the Co-Feature Network (COFENet), a novel deep learning network for fine-grained texture-based image classification. State-of-the-art (SoTA) methods on this mostly rely on feature concatenation by merging convolutional features into fully connected layers. Some existing work explored the variation between pair-wise features during learning, they only considered the relations in the feature channels, and did not explore the spatial or structural relations among the image regions where the features are extracted from. We propose to leverage such information among the features and their relative spatial layouts to capture richer pairwise, orientationwise, and distancewise relations among feature channels for end-to-end learning of intra-class and inter-class variations.

Learnable Discrete Wavelet Pooling (LDW-Pooling) for Convolutional Networks

Bor-Shiun Wang, Jun-Wei Hsieh, Ping-Yang Chen, Ming-Ching Chang, Lipeng Ke, Siwei Lyu

The British Machine Vision Conference (BMVC), 2021
Pooling is a simple but important layer in modern deep CNN architectures for feature aggregation and extraction. Typical CNN design focuses on the conv layers and activation functions, while leaving the pooling layers without suitable options. We introduce the Learning Discrete Wavelet Pooling (LDW-Pooling) that can be applied universally to replace standard pooling operations to better extract features with improved accuracy and efficiency. Motivated from the wavelet theory, we adopt the low-pass (L) and high-pass (H) filters horizontally and vertically for pooling on a 2D feature map. Feature signals are decomposed into four (LL, LH, HL, HH) subbands to better retain features and avoid information dropping. The wavelet transform ensures features after pooling can be fully preserved and recovered. We next adopt an energy-based attention learning to fine-select crucial and representative features. LDW-Pooling is effective and efficient when compared with other state-of-the-art pooling techniques such as WaveletPooling and LiftPooling. Extensive experimental validation shows that LDW-Pooling can be applied to a wide range of standard CNN architectures in replacing standard (max, mean, mixed, and stochastic) pooling operations and consistently outperforming them.

Under Review

Uncovering the Why: Interpretable CLIP Similarity via Dual Modalities Decomposition

Bor-Shiun Wang, Chien-Yi Wang*, Wei-Chen Chiu* (*=equal advising)

2026
The CLIP model has demonstrated strong capabilities in capturing the relationship between images and text through its learned high-dimensional representations. However, these dense features primarily express similarity via cosine distance, offering limited insight into the underlying causes of that similarity. Recent efforts have explored sparse decomposition techniques to extract semantically meaningful components from CLIP features as a form of interpretation. Nevertheless, we argue that these methods treat each modality independently, resulting in inconsistent decompositions that fail to reflect the cross-modal similarity from the aspect of concepts. In this paper, we introduce an explanation method for CLIP similarity via Dual Modalities Decomposition, CLIP-DMD, which employs a Sparse Autoencoder (SAE) to learn sparse decompositions of both CLIP image and text features within a shared concept space. To enhance interpretability, we propose two novel objectives: a Rate Constraint (RC) Loss, which promotes the crucial concepts to dominate the overall similarity, and a Corpus Cycle Consistency (CCC) Loss, which ensures that the most responsive features are both distinctive and accurately recognized by the encoder. To assess interpretability, we also design an evaluation protocol leveraging Large Language Models (LLMs) to provide automated and human-aligned assessments. Experimental results show that CLIP-DMD not only achieves competitive zero-shot classification, retrieval, and linear probing performance, but also delivers more human-understandable, reasonable, and preferable explanations of CLIP similarity compared to prior methods.

Education

National Yang Ming Chiao Tung University (NYCU)

Ph.D. in Institute of Computer Science and Engineering

2022 - Present

National Chiao Tung University (NCTU)

Master in Institute of Intelligent Systems

2020 - 2022

National Taiwan Ocean University (NTOU)

Bachelor of Computer Science and Engineering

2016 - 2020

Projects

Literature Atlas Viewer (LitAtlas)

1/2026~Present

  1. Developed LitAtlas (PaperGraph), an interactive research exploration tool that supports computing paper similarity from users’ own notes, enabling personalized literature mapping beyond paper metadata alone.
  2. Designed a personalized similarity framework that leverages user-written notes, titles, abstracts, and hashtags to capture relationships aligned with the user’s research perspective.
  3. Built a 2D node-edge visualization system that maps papers into an interactive graph, allowing users to explore related work through semantically meaningful connections.
  4. Implemented hybrid similarity modeling using both structured signals and LLM-based embeddings to represent explicit and latent relationships between papers.
  5. Developed an interface with adjustable similarity thresholds, enabling users to navigate between fine-grained connections and broader research clusters.
  6. Integrated support for Huggingface API and local models, enabling flexible, cost-efficient, and privacy-preserving deployment.
  7. Positioned LitAtlas as a tool for discovering research trends, gaps, and relevant connections in a way that adapts to each user’s own understanding of the literature.

Cassava Leaf Disease Classification

11/2020 - 2/2021

  1. A challenging task in fine-grained classification, identifying Cassava Leaf Disease, requires the model to distinguish between subtle morphological symptoms across highly similar categories.
  2. Soft-label technique relaxes the rigid constraints of one-hot encoding by encoding rich inter-class relationships and semantic similarities within the label space.
  3. Mix-up augmentation to increase the variation of the samples and generate the ground truth with the soft-label technique to enhance the discriminative ability of learned features.

College/University Student Research Application -- COFENet texture model optimization

National Science and Technology Council (NSTC)

7/2019 - 2/2020

  1. Developed COFENet, a novel deep learning architecture designed for fine-grained texture-based classification in images with high intra-class and low inter-class variation.
  2. Engineered a spatial-structural relation module that captures pairwise, orientation-wise, and distance-wise relationships between feature channels, surpassing traditional concatenation methods.
  3. Addressed classification challenges for small, blurry, and textured objects by integrating relative spatial layouts into end-to-end feature learning.
  4. Published paper to ICIP.

Contact