毕设

带你读论文 | 端到端语音识别模型

SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training

OpenAI Whisper test

Highlights of Interspeech 2022

ESPnet-ONNX: Bridging a Gap Between Research and Production

interspeech2022 9月20日内容

Unidirectional LLM (GPT2) are not very helpful as bidirecional (BERT, Roberta) as shown by IBM

Comparison and Analysis of New Curriculum Criteria for End-to-End ASR

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States

Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM

Modeling Dependent Structure for Utterances in ASR Evaluation

ASR2K: Speech Recognition for Around 2000 Languages without Audio

naab: A ready-to-use plug-and-play corpus for Farsi

ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge

Low-Level Physiological Implications of End-to-End Learning of Speech Recognition

Who spoke when: Choosing the right speaker diarization tool

Wav2vec 2.0: Learning the structure of speech from raw audio

The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines

How k2 calculates the transducer loss quickly

Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition

CUSIDE: A New Framework for Streaming Speech Recognition, Refreshing SOTA

Comparison and Analysis of New Curriculum Criteria for End-to-End ASR

Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning

musikalkemist/Deep-Learning-Audio-Application-From-Design-to-Deployment

facebook/wav2vec2-large-robust-ft-swbd-300h

torchaudio also got speech recognition models

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition