Towards the first principles of explaining DNNs: interactions explain the learning dynamics
Huilin ZHOU , Qihan REN , Junpeng ZHANG , Quanshi ZHANG
Front. Inform. Technol. Electron. Eng ›› 2025, Vol. 26 ›› Issue (7) : 1017 -1026.
Towards the first principles of explaining DNNs: interactions explain the learning dynamics
Most explanation methods are designed in an empirical manner, so exploring whether there exists a first-principles explanation of a deep neural network (DNN) becomes the next core scientific problem in explainable artificial intelligence (XAI). Although it is still an open problem, in this paper, we discuss whether the interaction-based explanation can serve as the first-principles explanation of a DNN. The strong explanatory power of interaction theory comes from the following aspects: (1) it establishes a new axiomatic system to quantify the decision-making logic of a DNN into a set of symbolic interaction concepts; (2) it simultaneously explains various deep learning phenomena, such as generalization power, adversarial sensitivity, representation bottleneck, and learning dynamics; (3) it provides mathematical tools that uniformly explain the mechanisms of various empirical attribution methods and empirical adversarial-transferability-boosting methods; (4) it explains the extremely complex learning dynamics of a DNN by analyzing the two-phase dynamics of interaction complexity, which further reveals the internal mechanism of why and how the generalization power/adversarial sensitivity of a DNN changes during the learning process.
First-principles explanation / Theory of equivalent interactions / Two-phase dynamics of interactions / Learning dynamics
Zhejiang University Press
/
| 〈 |
|
〉 |