Che Wag, Mipeg Liao, Zhogqiag Huag, Jiliag Lu, Juhog Wu, Yuche Liu, Cheqig Zog, Jiaju Zhag BLSP是一个大规模语音语言模型,能够同时理解语音和文本,支持语音文本间跨模态交互。该模型可以应用于语音对话/问答、语音识别、语音翻译和语音情感分析等领域,能够自动生成高质量的多语言文本,从而为跨模态、跨语言的交流提供便利。 More examples with video presetatios ca be foud i the project page. All experimets are carried out i the followig eviromet. Dowload the pretraied BLSP model lik We release the iferece code for evaluatio. The supported iput file is The ru the geeratio code You ca try out our demo by The traiig of BLSP cotais two stages. This step takes about 2 hours o 8 A100. The traiig process takes about 2.5 days o 8 A100. BLSP: Bootstrappig Laguage-Speech Pre-traiig via Behavior Aligmet of Cotiuatio Writig
模型介绍
Itroductio
Examples
Usage
Eviromet Preparatio
Prepare the pretraied BLSP checkpoit
Iferece & Evaluatio
.jsol
format. Here is a example of ST task. Each lie of iput file looks like{"audio": "/home/data/eval/1.wav"}
pytho3 blsp/geerate.py \
--iput_file "test.jsol" \
--output_file "test_out.jsol" \
--blsp_model $blsp_path \
--istructio "Please traslate the followig audio ito Germa text."
Lauchig Demo Locally
export CUDA_VISIBLE_DEVICES=0
pytho blsp/chat_demo.py \
--blsp_model $blsp_path
Traiig from Scratch
Stage 1: Fietue LLM with text istructio data
~/data/alpaca_data.jso
ad ru the process script.mkdir -p ~/data/stage1
pytho data_process/prepare_alpaca.py \
--iput_file ~/data/alpaca_data.jso \
--output_file ~/data/stage1/trai_alpaca.jsol
~/pretraied_models/llama2-7b-hf
. The ru the traiig script to perform text istructio tuig.export llama_path=~/pretraied_models/llama2-7b-hf
export DATA_ROOT=~/data/stage1
export SAVE_ROOT=~/checkpoits/stage1
bash blsp/scripts/trai_stage1_ddp.sh
Stage 2: Alig speech ad text via behavior aligmet of cotiuatio writig
~/data/gigaspeech
, ~/data/librispeech
, ~/data/commo_voice
respectively. The ru the process scripts.mkdir -p ~/data/stage2
pytho data_process/prepare_gigaspeech.py \
--iput_dir ~/data/gigaspeech \
--output_file ~/data/stage2/trai_gigaspeech.jsol
pytho data_process/prepare_librispeech.py \
--iput_dir ~/data/librispeech \
--output_file ~/data/stage2/trai_librispeech.jsol
pytho data_process/prepare_commo_voice.py \
--iput_dir ~/data/commo_voice \
--output_file ~/data/stage2/trai_commo_voice.jsol
mkdir -p ~/data/stage2/labels
export CUDA_VISIBLE_DEVICES=0
pytho3 -u data_process/asr_text_geeratio.py cotiue_writig \
--llm_path ~/checkpoits/stage1 \
--maifest ~/data/stage2/trai_gigaspeech.jsol \
--lab_dir ~/data/stage2/labels \
--shard 8 \
--rak 0 &
.
.
.
export CUDA_VISIBLE_DEVICES=7
pytho3 -u data_process/asr_text_geeratio.py cotiue_writig \
--llm_path ~/checkpoits/stage1 \
--maifest ~/data/stage2/trai_gigaspeech.jsol \
--lab_dir ~/data/stage2/labels \
--shard 8 \
--rak 7 &
pytho blsp/src/speech_text_paired_dataset.py offlie \
--dataroot ~/data/stage2/labels \
--maifest_files *.jsol \
--lm_path ~/checkpoits/stage1 \
--istructio "Cotiue the followig text i a coheret ad egagig style with less tha 40 words." \
--um_proc 32
~/pretraied_models/whisper-small
ad the ru the traiig script.export llama_path=~/checkpoits/stage1
export whisper_path=~/pretraied_models/whisper-small
export DATA_ROOT=~/data/stage2/labels
export SAVE_ROOT=~/checkpoits/stage2
bash blsp/scripts/trai_stage2_ddp.sh
Licese
点击空白处退出提示
评论