novelai3-text2image-dataset

我要开发同款
匿名用户2024年07月31日
111阅读

技术信息

官网地址
https://huggingface.co/shareAI
开源地址
https://modelscope.cn/models/shareAI/novelai3-text2image-dataset
授权协议
apache-2.0

作品详情

Novelai3 Images

The Novelai3 text-to-image distillatio dataset cotais over 30GB of aime-related (text, image) pairs, iteded solely for educatioal ad research purposes! It must ot be used for ay illicit activities.

Productio Method

The dataset was created through automated browser operatios, repeatedly clickig the "geerate image" butto ad savig the resultig images. Over the course of a moth, approximately 38GB of (image, text istructio) pairs were collected.
It has ot bee fiely filtered by humas, but there are plas to select ad refie it whe time ad eergy permit (a reduced versio of the dataset, ovelai3-filtered, will be released separately).

Use & Citatio

Feel free to trai a ope-source versio of the ope-ovelai3 image geeratio model o top of existig aime models, but please remember to cite our dataset work lik whe usig it for traiig (this data collectio effort was quite laborious i terms of time ad mapower ??). We hope to cotribute to the better developmet of ope-source artificial itelligece!

Some Traiig Suggestios

  1. It is ot recommeded to trai with the etire dataset all at oce.
  2. It is suggested to adjust the proportios ad repetitio frequecies of differet subcategories withi the dataset accordig to the style of the model you wish to lear (this ca be doe by directly addig or deletig data).
  3. You ca check if the curret prompt is what you wat, ad if ot, write Pytho scripts to perform batch replacemets (you ca eve use models like GPT4-V, Qwe-VL, BLIP2, Deepbooru, etc., to replace the curret tags as eeded).
  4. For the categories you're particularly iterested i, you ca maually review the specific image cotet ad actively delete some of the poorly geerated samples before traiig (this is aki to a huma preferece selectio, which will improve the fial quality of the model).

Dowload Method

Aistudio
https://aistudio.baidu.com/datasetdetail/257868

huggigface
https://huggigface.co/datasets/shareAI/ovelai3

Tip: If you fid some categories you're iterested i are ot curretly icluded, feel free to make suggestios ad we will cosider expadig the dataset.

Novelai3 Images

ovelai3的文本生成图片蒸馏数据集 ,包含30余G二次元动漫方面的(文本,图像)对,仅作为学习和研究使用!不得用于违规用途。

制作方式

通过自动化浏览器操作,不断点击生图按钮和保存生成的图像,最后收集得到的一个数据集,时间为期一个月,大概获得了38个G的(图像,文本指令)对。
未经过人为精细筛选,计划之后有时间精力再进行筛选(会单独再放出一个ovelai3-filtered的缩小版数据集)

使用 & 引用

欢迎拿去在已有的动漫模型基础上训练开源版本的ope-ovelai3图像生成模型哈,但请使用训练时不要忘了引用我们的数据集工作链接(本次数据收集工作在时间和人力上都是很辛苦的??) 希望能够为人工智能开源更好的发展贡献一份绵薄之力~

一点训练建议

0、不建议直接使用全部数据一股脑进行训练。
1、建议根据自己想要学出的模型风格,按需调整数据集中各个子类别的占比、重复次数等(可以通过直接增删数据)
2、可以查看当前的prompt是否是你想要的,如果不是可以编写pytho脚本规则进行批量替换(甚至可以调用GPT4-V、Qwe-VL、BLIP2、Deepbooru等模型按需替换掉当前标签)
3、针对你重点关注的那些类别,可以手动查看具体图像内容,主动删一些生成效果不好的学习样本再训练 (相当于人类偏好选择,会提升模型最终的质量表现)

下载方式

aistudio
https://aistudio.baidu.com/datasetdetail/257868

huggigface
https://huggigface.co/datasets/shareAI/ovelai3

tip: 如果你发现有哪些想要的类别当前没有,欢迎提出建议我们会再进行数据扩充。

功能介绍

Novelai3 Images The Novelai3 text-to-image distillation dataset contains over 30GB of anime-related

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论