Beyond performance: Explaining generalisation failures of Robotic Foundation Models in industrial simulation
David Kube , Simon Hadwiger , Tobias Meisen
Biomimetic Intelligence and Robotics ›› 2025, Vol. 5 ›› Issue (4) : 100249
Beyond performance: Explaining generalisation failures of Robotic Foundation Models in industrial simulation
This study investigates the generalisation and explainability challenges of Robotic Foundation Models (RFMs) in industrial applications, using Octo as a representative case study. Motivated by the scarcity of domain-specific data and the need for safe evaluation environments, we adopt a simulation-first approach: instead of transitioning from simulation to real-world scenarios, we aim to adapt real-world-trained RFMs to synthetic, simulated environments — a critical step towards their safe and effective industrial deployment. While Octo promises zero-shot generalisation, our experiments reveal significant performance degradation when applied in simulation, despite minimal task and observation domain shifts. To explain this behaviour, we introduce a modified Grad-CAM technique that enables insight into Octo’s internal reasoning and focus areas. Our results highlight key limitations in Octo’s visual generalisation and language grounding capabilities under distribution shifts. We further identify architectural and benchmarking challenges across the broader RFM landscape. Based on our findings, we propose concrete guidelines for future RFM development, with an emphasis on explainability, modularity, and robust benchmarking — critical enablers for applying RFMs in safety-critical and data-scarce industrial environments.
Explainability / Industrial robotics / Robotic Foundation Models / Reasoning visualisation / Zero-shot generalisation
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
Helix: A vision-language-action model for generalist humanoid control, 2025, https://www.figure.ai/news/helix. (Accessed 18 March 2025). |
| [31] |
Gemini robotics, 2025, https://deepmind.google/technologies/gemini-robotics/. (Accessed 13 March 2025). |
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
|
| [55] |
|
| [56] |
|
| [57] |
|
| [58] |
|
| [59] |
|
| [60] |
|
| [61] |
|
| [62] |
|
| [63] |
|
| [64] |
|
| [65] |
Deep Reinforcement Learning: Fundamentals, Research and Applications, Springer Singapore Pte. Limited, 2020, Singapore, 2020. |
| [66] |
|
| [67] |
|
| [68] |
|
| [69] |
|
| [70] |
Bullet real-time physics simulation, 2025, https://pybullet.org/wordpress/. (Accessed 2 March 2025). |
| [71] |
|
| [72] |
|
| [73] |
|
| [74] |
|
| [75] |
|
| [76] |
|
| [77] |
|
| [78] |
|
| [79] |
|
| [80] |
|
| [81] |
|
| [82] |
|
/
| 〈 |
|
〉 |