回到首页
毕设
带你读论文 | 端到端语音识别模型
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
OpenAI Whisper test
Highlights of Interspeech 2022
ESPnet-ONNX: Bridging a Gap Between Research and Production
OpenAI-whisper
interspeech2022 9月20日内容
Unidirectional LLM (GPT2) are not very helpful as bidirecional (BERT, Roberta) as shown by IBM
Comparison and Analysis of New Curriculum Criteria for End-to-End ASR
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM
Modeling Dependent Structure for Utterances in ASR Evaluation
ASR2K: Speech Recognition for Around 2000 Languages without Audio
naab: A ready-to-use plug-and-play corpus for Farsi
ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge
Low-Level Physiological Implications of End-to-End Learning of Speech Recognition
Who spoke when: Choosing the right speaker diarization tool
Wav2vec 2.0: Learning the structure of speech from raw audio
The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines
How k2 calculates the transducer loss quickly
Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition
CUSIDE: A New Framework for Streaming Speech Recognition, Refreshing SOTA
Comparison and Analysis of New Curriculum Criteria for End-to-End ASR
Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning
musikalkemist/Deep-Learning-Audio-Application-From-Design-to-Deployment
facebook/wav2vec2-large-robust-ft-swbd-300h
torchaudio also got speech recognition models
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition