Towards flexible perception with visual memory
Towards flexible perception with visual memory
Training a neural network is a monolithic endeavor, akin to carving knowledge into stone: once the process is completed, editing the knowledge in a network is hard, since all information is distributed across the network's weights. We here explore a simple, compelling alternative by marrying the representational power of deep neural networks with the flexibility of a database. Decomposing the task of image classification into image similarity (from a pre-trained embedding) and search (via fast nearest neighbor retrieval from a knowledge database), we build on well-established components to construct a simple and flexible visual memory that has the following key capabilities: (1.) The ability to flexibly add data across scales: from individual samples all the way to entire classes and billion-scale data; (2.) The ability to remove data through unlearning and memory pruning; (3.) An interpretable decision-mechanism on which we can intervene to control its behavior. Taken together, these capabilities comprehensively demonstrate the benefits of an explicit visual memory. We hope that it might contribute to a conversation on how knowledge should be represented in deep vision models -- beyond carving it in "stone" weights.
Robert Geirhos、Xi Yi、Sourabh Medapati、Priyank Jaini、Abhijit Ogale、George Toderici、Austin Stone、Jonathon Shlens
计算技术、计算机技术
Robert Geirhos,Xi Yi,Sourabh Medapati,Priyank Jaini,Abhijit Ogale,George Toderici,Austin Stone,Jonathon Shlens.Towards flexible perception with visual memory[EB/OL].(2025-08-13)[2025-08-24].https://arxiv.org/abs/2408.08172.点此复制
评论