Fine-grained sequence-to-sequence lip reading based on self-attention and self-distillation