Published: September 25, 2024
Author(s)
Sunny Shree (UTA), Yu Lei (UTA), Raghu Kacker (NIST), Richard Kuhn (NIST)
Conference
Name: The 6th IEEE International Conference on Artificial Intelligence Testing
Dates: 07/15/2024 - 07/18/2024
Location: Shanghai, China
Citation: 2024 IEEE International Conference on Artificial Intelligence Testing (AITest), pp. 64-72
Machine learning (ML)-based Artificial Intelligence (AI) systems rely on training data to perform optimally, but the internal workings of how ML models learn from and use this data are often a black- box. Influence analysis provides valuable insights into the model's behavior by evaluating the effect of individual training instances on the model's predictions. However, calculating the influence of each training data can be computationally expensive. In this paper, we propose a proxy model-based approach to influence analysis called Proxima. The main idea of our approach is to use a subset of training instances to create a proxy model that is simpler than the original model, and then use the proxy model and the subset to perform influence analysis. We evaluate Proxima on eight ML models trained using real-world datasets. We compare Proxima to two state-of-the-art influence analysis tools, i.e., FastIF and Scaling-Up. Our experimental results suggest that the proposed approach can successfully perform and speed up the influence analysis process, and, in most cases, perform better than FastIF and Scaling-Up.
Machine learning (ML)-based Artificial Intelligence (AI) systems rely on training data to perform optimally, but the internal workings of how ML models learn from and use this data are often a black- box. Influence analysis provides valuable insights into the model's behavior by evaluating the...
See full abstract
Machine learning (ML)-based Artificial Intelligence (AI) systems rely on training data to perform optimally, but the internal workings of how ML models learn from and use this data are often a black- box. Influence analysis provides valuable insights into the model's behavior by evaluating the effect of individual training instances on the model's predictions. However, calculating the influence of each training data can be computationally expensive. In this paper, we propose a proxy model-based approach to influence analysis called Proxima. The main idea of our approach is to use a subset of training instances to create a proxy model that is simpler than the original model, and then use the proxy model and the subset to perform influence analysis. We evaluate Proxima on eight ML models trained using real-world datasets. We compare Proxima to two state-of-the-art influence analysis tools, i.e., FastIF and Scaling-Up. Our experimental results suggest that the proposed approach can successfully perform and speed up the influence analysis process, and, in most cases, perform better than FastIF and Scaling-Up.
Hide full abstract
Keywords
Datasets; Machine Learning; Neural Networks; Influence Analysis; Influence Functions; Training Data Similarity
Control Families
None selected