wav2vec2-large-xlsr-53-english

我要开发同款
匿名用户2024年07月31日
24阅读
所属分类aiPytorch
开源地址https://modelscope.cn/models/AI-ModelScope/wav2vec2-large-xlsr-53-english
授权协议Apache License 2.0

作品详情

Fine-tuned XLSR-53 large model for speech recognition in English

Fine-tuned facebook/wav2vec2-large-xlsr-53 on English using the train and validation splits of Common Voice 6.1. When using this model, make sure that your speech input is sampled at 16kHz.

This model has been fine-tuned thanks to the GPU credits generously given by the OVHcloud :)

The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint

Usage

The model can be used directly (without a language model) as follows…

Using the HuggingSound library:

from modelscope import snapshot_download
from huggingsound import SpeechRecognitionModel

local_model = snapshot_download("AI-ModelScope/wav2vec2-large-xlsr-53-english",revision='master')
model = SpeechRecognitionModel(local_model)
audio_paths = ["/path/to/file.mp3", "/path/to/another_file.wav"]

transcriptions = model.transcribe(audio_paths)

Writing your own inference script:

import torch
import librosa
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
from modelscope import snapshot_download

LANG_ID = "en"
MODEL_ID = "AI-ModelScope/wav2vec2-large-xlsr-53-english"
SAMPLES = 10

test_dataset = load_dataset("common_voice", LANG_ID, split=f"test[:{SAMPLES}]")
local_model = snapshot_download(MODEL_ID,revision='master')
processor = Wav2Vec2Processor.from_pretrained(local_model)
model = Wav2Vec2ForCTC.from_pretrained(local_model)

# Preprocessing the datasets.
# We need to read the audio files as arrays
def speech_file_to_array_fn(batch):
    speech_array, sampling_rate = librosa.load(batch["path"], sr=16_000)
    batch["speech"] = speech_array
    batch["sentence"] = batch["sentence"].upper()
    return batch

test_dataset = test_dataset.map(speech_file_to_array_fn)
inputs = processor(test_dataset["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)

with torch.no_grad():
    logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits

predicted_ids = torch.argmax(logits, dim=-1)
predicted_sentences = processor.batch_decode(predicted_ids)

for i, predicted_sentence in enumerate(predicted_sentences):
    print("-" * 100)
    print("Reference:", test_dataset[i]["sentence"])
    print("Prediction:", predicted_sentence)
Reference Prediction
"SHE'LL BE ALL RIGHT." SHE'LL BE ALL RIGHT
SIX SIX
"ALL'S WELL THAT ENDS WELL." ALL AS WELL THAT ENDS WELL
DO YOU MEAN IT? DO YOU MEAN IT
THE NEW PATCH IS LESS INVASIVE THAN THE OLD ONE, BUT STILL CAUSES REGRESSIONS. THE NEW PATCH IS LESS INVASIVE THAN THE OLD ONE BUT STILL CAUSES REGRESSION
HOW IS MOZILLA GOING TO HANDLE AMBIGUITIES LIKE QUEUE AND CUE? HOW IS MOSLILLAR GOING TO HANDLE ANDBEWOOTH HIS LIKE Q AND Q
"I GUESS YOU MUST THINK I'M KINDA BATTY." RUSTIAN WASTIN PAN ONTE BATTLY
NO ONE NEAR THE REMOTE MACHINE YOU COULD RING? NO ONE NEAR THE REMOTE MACHINE YOU COULD RING
SAUCE FOR THE GOOSE IS SAUCE FOR THE GANDER. SAUCE FOR THE GUICE IS SAUCE FOR THE GONDER
GROVES STARTED WRITING SONGS WHEN SHE WAS FOUR YEARS OLD. GRAFS STARTED WRITING SONGS WHEN SHE WAS FOUR YEARS OLD

Evaluation

  1. To evaluate on mozilla-foundation/common_voice_6_0 with split test
python eval.py --model_id jonatasgrosman/wav2vec2-large-xlsr-53-english --dataset mozilla-foundation/common_voice_6_0 --config en --split test
  1. To evaluate on speech-recognition-community-v2/dev_data
python eval.py --model_id jonatasgrosman/wav2vec2-large-xlsr-53-english --dataset speech-recognition-community-v2/dev_data --config en --split validation --chunk_length_s 5.0 --stride_length_s 1.0

Citation

If you want to cite this model you can use this:

@misc{grosman2021xlsr53-large-english,
  title={Fine-tuned {XLSR}-53 large model for speech recognition in {E}nglish},
  author={Grosman, Jonatas},
  howpublished={\url{https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english}},
  year={2021}
}
声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论