IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Recent advancements in post-hoc and inherently interpretable methods have markedly enhanced the explanations of black box classifier models. These methods operate either through post-analysis or by integrating concept learning during model training. Although being effective in bridging the semantic gap between a model's latent space and human interpretation, these explanation methods only partially reveal the model's decision-making process. The outcome is typically limited to high-level semantics derived from the last feature map. We argue that the explanations lacking insights into the decision processes at low and mid-level features are neither fully faithful nor useful. Addressing this gap, we introduce the Multi-Level Concept Prototypes Classifier (MCPNet), an inherently interpretable model. MCPNet autonomously learns meaningful concept prototypes across multiple feature map levels using Centered Kernel Alignment (CKA) loss and an energy-based weighted PCA mechanism, and it does so without reliance on predefined concept labels. Further, we propose a novel classifier paradigm that learns and aligns multi-level concept prototype distributions for classification purposes. Our experiments reveal that our proposed MCPNet, while being adaptable to various model architectures, offers comprehensive multi-level explanations with maintaining the classification accuracy. Additionally, its concept distribution-based classification approach shows improved generalization capabilities in few-shot classification scenarios.
PRB-FPN+: Video Analytics for Enforcing Motorcycle Helmet Laws
IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) on the AI City Challenge, 2023
We present a video analytic system for enforcing motorcycle helmet regulation as a participation to the AI City Challenge 2023 Track 5 contest. The advert of powerful object detectors enables real-time localization of the road users and even the ability to determine if a motorcyclist or a rider is wearing a helmet. Ensuring road safety is important, as the helmets can effectively provide protection against severe injuries and fatalities. However, monitoring and enforcing helmet compliance is challenging, given the large number of motorcyclists and limited visual input such as occlusions. To address these challenges, we propose a novel two-step approach. First, we introduce the PRB-FPN+, a state-of-the-art detector that excels in object localization. We also explore the benefits of deep supervision by incorporating auxiliary heads within the network, leading to enhanced performance of our deep learning architectures. Second, we utilize an advanced tracker named SMILEtrack to associate and refine the target tracklets. Comprehensive experimental results demonstrate that the PRB-FPN+ outperforms the state-of-the-art detectors on MS-COCO. Our system achieved a remarkable rank of 8 on the AI City Challenge 2023 Track 5 Public Leaderboard.
COFENet: Co-Feature Neural Network Model for Fine-Grained Image Classification
IEEE International Conference on Image Processing (ICIP), 2022
It is challenging to classify patterns with small inter-class variations but large intra-class variations especially for textured objects with relatively small sizes and blurry boundaries. We propose the Co-Feature Network (COFENet), a novel deep learning network for fine-grained texture-based image classification. State-of-the-art (SoTA) methods on this mostly rely on feature concatenation by merging convolutional features into fully connected layers. Some existing work explored the variation between pair-wise features during learning, they only considered the relations in the feature channels, and did not explore the spatial or structural relations among the image regions where the features are extracted from. We propose to leverage such information among the features and their relative spatial layouts to capture richer pairwise, orientationwise, and distancewise relations among feature channels for end-to-end learning of intra-class and inter-class variations.
Learnable Discrete Wavelet Pooling (LDW-Pooling) for Convolutional Networks
The British Machine Vision Conference (BMVC), 2021
Pooling is a simple but important layer in modern deep CNN architectures for feature aggregation and extraction. Typical CNN design focuses on the conv layers and activation functions, while leaving the pooling layers without suitable options. We introduce the Learning Discrete Wavelet Pooling (LDW-Pooling) that can be applied universally to replace standard pooling operations to better extract features with improved accuracy and efficiency. Motivated from the wavelet theory, we adopt the low-pass (L) and high-pass (H) filters horizontally and vertically for pooling on a 2D feature map. Feature signals are decomposed into four (LL, LH, HL, HH) subbands to better retain features and avoid information dropping. The wavelet transform ensures features after pooling can be fully preserved and recovered. We next adopt an energy-based attention learning to fine-select crucial and representative features. LDW-Pooling is effective and efficient when compared with other state-of-the-art pooling techniques such as WaveletPooling and LiftPooling. Extensive experimental validation shows that LDW-Pooling can be applied to a wide range of standard CNN architectures in replacing standard (max, mean, mixed, and stochastic) pooling operations and consistently outperforming them.
Resume
Education
Ph.D. in Institute of Computer Science and Engineering