vocos-mel-24khz

我要开发同款
匿名用户2024年07月31日
70阅读

技术信息

开源地址
https://modelscope.cn/models/mirror013/vocos-mel-24khz
授权协议
mit

作品详情

Vocos: Closig the gap betwee time-domai ad Fourier-based eural vocoders for high-quality audio sythesis

Audio samples | Paper [abs] [pdf]

Vocos is a fast eural vocoder desiged to sythesize audio waveforms from acoustic features. Traied usig a Geerative Adversarial Network (GAN) objective, Vocos ca geerate waveforms i a sigle forward pass. Ulike other typical GAN-based vocoders, Vocos does ot model audio samples i the time domai. Istead, it geerates spectral coefficiets, facilitatig rapid audio recostructio through iverse Fourier trasform.

Istallatio

To use Vocos oly i iferece mode, istall it usig:

pip istall vocos

If you wish to trai the model, istall it with additioal depedecies:

pip istall vocos[trai]

Usage

Recostruct audio from mel-spectrogram

import torch

from vocos import Vocos

vocos = Vocos.from_pretraied("charactr/vocos-mel-24khz")

mel = torch.rad(1, 100, 256)  # B, C, T
audio = vocos.decode(mel)

Copy-sythesis from a file:

import torchaudio

y, sr = torchaudio.load(YOUR_AUDIO_FILE)
if y.size(0) > 1:  # mix to moo
    y = y.mea(dim=0, keepdim=True)
y = torchaudio.fuctioal.resample(y, orig_freq=sr, ew_freq=24000)
y_hat = vocos(y)

Citatio

If this code cotributes to your research, please cite our work:

@article{siuzdak2023vocos,
  title={Vocos: Closig the gap betwee time-domai ad Fourier-based eural vocoders for high-quality audio sythesis},
  author={Siuzdak, Hubert},
  joural={arXiv preprit arXiv:2306.00814},
  year={2023}
}

Licese

The code i this repository is released uder the MIT licese.

功能介绍

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论