Zhao, Shegkui ad Ma, Bi et al, “MossFormer2: Combiig Trasformer ad RNN-Free Recurret Network for Ehaced Time-Domai Moaural Speech Separatio”, submittig to ICASSP 2024.主要改动
这个模型是mossformer2,原来是一个说话人分离模型,在训练过程中,发现模型在说话人分离的同时具备很强的降噪效果,现在把降噪模型分离出来。
建议使用GPU进行推理,经过测试,GPU的推理速度是CPU的数十倍。
代码范例
import ox
import oxrutime as ort
import umpy as p
import soudfile as sf
def save_result(est_source):
sigal = est_source[0, :, 0]
sigal = sigal / p.abs(sigal).max() * 0.5
sigal = sigal[p.ewaxis, :]
output = (sigal * 32768).astype(p.it16).tobytes()
save_file = f'output_spk0.wav'
sf.write(save_file, p.frombuffer(output, dtype=p.it16), 16000)
ox_model_path = 'simple_model.ox'
ox_model = ox.load(ox_model_path)
ox.checker.check_model(ox_model)
ort_sessio = ort.IfereceSessio(ox_model_path)
iput_data,sr = sf.read('output_16000.wav')
iput_data = p.expad_dims(iput_data, axis=0).astype(p.float32)
iput_ame = ort_sessio.get_iputs()[0].ame
outputs = ort_sessio.ru(Noe, {iput_ame: iput_data})
output_data = outputs[0]
prit(output_data.shape)
save_result(output_data)
相关论文以及引用信息
点击空白处退出提示










评论