Multi-agent deep reinforcement learning for end–edge orchestrated resource allocation in industrial wireless networks