https://www.isca-speech.org/archive/interspeech_2022/udagawa22b_interspeech.html
Unidirectional LLM (GPT2) are not very helpful as bidirecional (BERT, Roberta) as shown by IBM
In this study, we have re-examined the effect of the fundamental LLM rescoring approach [9, 10] on a competitive ConformerTransducer baseline and conducted a detailed analysis. Based on our experiments, we have demonstrated consistent improvement in ASR accuracy using bidirectional (but not unidirectional) LLM rescoring. We also observed additional gains from general-domain pretraining, in-domain finetuning and context augmentation when using the bidirectional LLMs.
Lastly, we’ve conducted a simple lexical-based analysis to examine the effect of LLM rescoring. We showed that error
reduction from rescoring can be different across (and characterized by) the word frequency and error type. Based on our analysis, we shed light on how each variant of LLM contributes to WER reduction and explain the failure mode of unidirectional LLMs, being unable to balance both error types.