Accelerating BERT inference with GPU-efficient exit prediction