|国家预印本平台
首页|Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition

Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition

Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition

来源:Arxiv_logoArxiv
英文摘要

In this paper, we address the following question: How do generic foundation models (e.g., CLIP, BLIP, LLaVa, DINO) compare against a domain-specific face recognition model (viz., AdaFace or ArcFace) on the face recognition task? Through a series of experiments involving several foundation models and benchmark datasets, we are able to report the following findings: (a) In all datasets considered, domain-specific models outperformed zero-shot foundation models. (b) The performance of zero-shot generic foundation models improves on over-segmented face images than tightly cropped faces thereby suggesting the importance of contextual clues. For example, at a False Match Rate (FMR) of 0.01%, the True Match Rate (TMR) of OpenCLIP improved from 64.97% to 81.73% on the LFW dataset as the face crop increased from 112x112 to 250x250 while the TMR of domain-specific AdaFace dropped from 99.09% to 77.31%. (c) A simple score-level fusion of a foundation model with a domain-specific FR model improved the accuracy at low FMRs. For example, the TMR of AdaFace when fused with BLIP improved from 72.64% to 83.31% at an FMR of 0.0001% on the IJB-B dataset and from 73.17% to 85.81% on the IJB-C dataset. (d) Foundation models, such as ChatGPT, can be used to impart explainability to the FR pipeline (e.g., ``Despite minor lighting and head tilt differences, the two left-profile images show high consistency in forehead slope, nose shape, chin contour...''). In some instances, foundation models are even able to resolve low-confidence decisions made by AdaFace (e.g., ``Although AdaFace assigns a low similarity score of 0.21, both images exhibit visual similarity...and the pair is likely of the same person''), thereby reiterating the importance of combining domain-specific FR models with generic foundation models in a judicious manner.

Redwan Sony、Parisa Farmanifard、Arun Ross、Anil K. Jain

计算技术、计算机技术

Redwan Sony,Parisa Farmanifard,Arun Ross,Anil K. Jain.Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition[EB/OL].(2025-07-04)[2025-07-16].https://arxiv.org/abs/2507.03541.点此复制

评论