ControlNet-v1-1-nightly

我要开发同款
匿名用户2024年07月31日
98阅读

技术信息

开源地址
https://modelscope.cn/models/camenduru/ControlNet-v1-1-nightly

作品详情

CotrolNet 1.1

This is the official release of CotrolNet 1.1.

CotrolNet 1.1 has the exactly same architecture with CotrolNet 1.0.

We promise that we will ot chage the eural etwork architecture before CotrolNet 1.5 (at least, ad hopefully we will ever chage the etwork architecture). Perhaps this is the best ews i CotrolNet 1.1.

CotrolNet 1.1 icludes all previous models with improved robustess ad result quality. Several ew models are added.

Note that we are still workig o updatig this to A1111.

This repo will be merged to CotrolNet after we make sure that everythig is OK.

Note that we are actively editig this page ow. The iformatio i this page will be more detailed ad fialized whe CotrolNet 1.1 is ready.

This Project is NOT a A1111 extesio

Please do ot copy the URL of this project ito your A1111.

If you wat to use CotrolNet 1.1 i A1111, you oly eed to istall https://github.com/Mikubill/sd-webui-cotrolet

If you wat to use CotrolNet 1.1 i A1111, you oly eed to follow the istructios i https://github.com/Mikubill/sd-webui-cotrolet

This project is for research use ad academic experimets. Agai, do NOT istall CotrolNet-v1-1-ightly ito your A1111.

How to use CotrolNet 1.1 i A1111?

The Beta Test for A1111 Is Started.

The A1111 plugi is: https://github.com/Mikubill/sd-webui-cotrolet

The discussio ad bug report is: https://github.com/Mikubill/sd-webui-cotrolet/issues/736

For researchers who are ot familiar with A1111: The A1111 plugi supports arbitrary combiatio of arbitrary umber of CotrolNets, arbitrary commuity models, arbitrary LoRAs, ad arbitrary samplig methods! We should defiitely try it!

Model Specificatio

Startig from CotrolNet 1.1, we begi to use the Stadard CotrolNet Namig Rules (SCNNRs) to ame all models. We hope that this amig rule ca improve the user experiece.

img

CotrolNet 1.1 iclude 14 models (11 productio-ready models ad 3 experimetal models):

cotrol_v11p_sd15_cay
cotrol_v11p_sd15_mlsd
cotrol_v11f1p_sd15_depth
cotrol_v11p_sd15_ormalbae
cotrol_v11p_sd15_seg
cotrol_v11p_sd15_ipait
cotrol_v11p_sd15_lieart
cotrol_v11p_sd15s2_lieart_aime
cotrol_v11p_sd15_opepose
cotrol_v11p_sd15_scribble
cotrol_v11p_sd15_softedge
cotrol_v11e_sd15_shuffle
cotrol_v11e_sd15_ip2p
cotrol_v11f1e_sd15_tile

You ca dowload all those models from our HuggigFace Model Page. All these models should be put i the folder "models".

You eed to dowload Stable Diffusio 1.5 model "v1-5-prued.ckpt" ad put it i the folder "models".

Our pytho codes will automatically dowload other aotator models like HED ad OpePose. Nevertheless, if you wat to maually dowload these, you ca dowload all other aotator models from here. All these models should be put i folder "aotator/ckpts".

To istall:

coda ev create -f eviromet.yaml
coda activate cotrol-v11

Note that if you use 8GB GPU, you eed to set "save_memory = True" i "cofig.py".

CotrolNet 1.1 Depth

Cotrol Stable Diffusio with Depth Maps.

Model file: cotrolv11f1psd15_depth.pth

Cofig file: cotrolv11f1psd15_depth.yaml

Traiig data: Midas depth (resolutio 256/384/512) + Leres Depth (resolutio 256/384/512) + Zoe Depth (resolutio 256/384/512). Multiple depth map geerator at multiple resolutio as data augmetatio.

Acceptable Preprocessors: DepthMidas, DepthLeres, Depth_Zoe. This model is highly robust ad ca work o real depth map from rederig egies.

pytho gradio_depth.py

No-cherry-picked batch test with radom seed 12345 ("a hadsome ma"):

img

Update

2023/04/14: 72 hours ago we uploaded a wrog model "cotrolv11psd15depth" by mistake. That model is a itermediate checkpoit durig the traiig. That model is ot coverged ad may cause distortio i results. We uploaded the correct depth model as "cotrolv11f1psd15depth". The "f1" meas bug fix 1. The icorrect model is removed. Sorry for the icoveiece.

Improvemets i Depth 1.1:

  1. The traiig dataset of previous cet 1.0 has several problems icludig (1) a small group of greyscale huma images are duplicated thousads of times (!!), causig the previous model somewhat likely to geerate grayscale huma images; (2) some images has low quality, very blurry, or sigificat JPEG artifacts; (3) a small group of images has wrog paired prompts caused by a mistake i our data processig scripts. The ew model fixed all problems of the traiig dataset ad should be more reasoable i may cases.
  2. The ew depth model is a relatively ubiased model. It is ot traied with some specific type of depth by some specific depth estimatio method. It is ot over-fitted to oe preprocessor. This meas this model will work better with differet depth estimatio, differet preprocessor resolutios, or eve with real depth created by 3D egies.
  3. Some reasoable data augmetatios are applied to traiig, like radom left-right flippig.
  4. The model is resumed from depth 1.0, ad it should work well i all cases where depth 1.0 works well. If ot, please ope a issue with image, ad we will take a look at your case. Depth 1.1 works well i may failure cases of depth 1.0.
  5. If you use Midas depth (the "depth" i webui plugi) with 384 preprocessor resolutio, the differece betwee depth 1.0 ad 1.1 should be miimal. However, if you try other preprocessor resolutios or other preprocessors (like leres ad zoe), the depth 1.1 is expected to be a bit better tha 1.0.

CotrolNet 1.1 Normal

Cotrol Stable Diffusio with Normal Maps.

Model file: cotrolv11psd15_ormalbae.pth

Cofig file: cotrolv11psd15_ormalbae.yaml

Traiig data: Bae's ormalmap estimatio method.

Acceptable Preprocessors: Normal BAE. This model ca accept ormal maps from rederig egies as log as the ormal map follows ScaNet's protocol. That is to say, the color of your ormal map should look like the secod colum of this image.

Note that this method is much more reasoable tha the ormal-from-midas method i CotrolNet 1.1. The previous method will be abadoed.

pytho gradio_ormalbae.py

No-cherry-picked batch test with radom seed 12345 ("a ma made of flowers"):

img

No-cherry-picked batch test with radom seed 12345 ("room"):

img

Improvemets i Normal 1.1:

  1. The ormal-from-midas method i Normal 1.0 is either reasoable or physically correct. That method does ot work very well i may images. The ormal 1.0 model caot iterpret real ormal maps created by rederig egies.
  2. This Normal 1.1 is much more reasoable because the preprocessor is traied to estimate ormal maps with a relatively correct protocol (NYU-V2's visualizatio method). This meas the Normal 1.1 ca iterpret real ormal maps from rederig egies as log as the colors are correct (blue is frot, red is left, gree is top).
  3. I our test, this model is robust ad ca achieve similar performace to the depth model. I previous CNET 1.0, the Normal 1.0 is ot very frequetly used. But this Normal 2.0 is much improved ad has potetial to be used much more frequetly.

CotrolNet 1.1 Cay

Cotrol Stable Diffusio with Cay Maps.

Model file: cotrolv11psd15_cay.pth

Cofig file: cotrolv11psd15_cay.yaml

Traiig data: Cay with radom thresholds.

Acceptable Preprocessors: Cay.

We fixed several problems i previous traiig datasets.

pytho gradio_cay.py

No-cherry-picked batch test with radom seed 12345 ("dog i a room"):

img

Improvemets i Cay 1.1:

  1. The traiig dataset of previous cet 1.0 has several problems icludig (1) a small group of greyscale huma images are duplicated thousads of times (!!), causig the previous model somewhat likely to geerate grayscale huma images; (2) some images has low quality, very blurry, or sigificat JPEG artifacts; (3) a small group of images has wrog paired prompts caused by a mistake i our data processig scripts. The ew model fixed all problems of the traiig dataset ad should be more reasoable i may cases.
  2. Because the Cay model is oe of the most importat (perhaps the most frequetly used) CotrolNet, we used a fud to trai it o a machie with 8 Nvidia A100 80G with batchsize 8×32=256 for 3 days, spedig 72×30=2160 USD (8 A100 80G with 30 USD/hour). The model is resumed from Cay 1.0.
  3. Some reasoable data augmetatios are applied to traiig, like radom left-right flippig.
  4. Although it is difficult to evaluate a CotrolNet, we fid Cay 1.1 is a bit more robust ad a bit higher visual quality tha Cay 1.0.

CotrolNet 1.1 MLSD

Cotrol Stable Diffusio with M-LSD straight lies.

Model file: cotrolv11psd15_mlsd.pth

Cofig file: cotrolv11psd15_mlsd.yaml

Traiig data: M-LSD Lies.

Acceptable Preprocessors: MLSD.

We fixed several problems i previous traiig datasets. The model is resumed from CotrolNet 1.0 ad traied with 200 GPU hours of A100 80G.

pytho gradio_mlsd.py

No-cherry-picked batch test with radom seed 12345 ("room"):

img

Improvemets i MLSD 1.1:

  1. The traiig dataset of previous cet 1.0 has several problems icludig (1) a small group of greyscale huma images are duplicated thousads of times (!!), causig the previous model somewhat likely to geerate grayscale huma images; (2) some images has low quality, very blurry, or sigificat JPEG artifacts; (3) a small group of images has wrog paired prompts caused by a mistake i our data processig scripts. The ew model fixed all problems of the traiig dataset ad should be more reasoable i may cases.
  2. We elarged the traiig dataset by addig 300K more images by usig MLSD to fid images with more tha 16 straight lies i it.
  3. Some reasoable data augmetatios are applied to traiig, like radom left-right flippig.
  4. Resumed from MLSD 1.0 with cotiued traiig with 200 GPU hours of A100 80G.

CotrolNet 1.1 Scribble

Cotrol Stable Diffusio with Scribbles.

Model file: cotrolv11psd15_scribble.pth

Cofig file: cotrolv11psd15_scribble.yaml

Traiig data: Sythesized scribbles.

Acceptable Preprocessors: Sythesized scribbles (ScribbleHED, ScribblePIDI, etc.) or had-draw scribbles.

We fixed several problems i previous traiig datasets. The model is resumed from CotrolNet 1.0 ad traied with 200 GPU hours of A100 80G.

# To test sythesized scribbles
pytho gradio_scribble.py
# To test had-draw scribbles i a iteractive demo
pytho gradio_iteractive.py

No-cherry-picked batch test with radom seed 12345 ("ma i library"):

img

No-cherry-picked batch test with radom seed 12345 (iteractive, "the beautiful ladscape"):

img

Improvemets i Scribble 1.1:

  1. The traiig dataset of previous cet 1.0 has several problems icludig (1) a small group of greyscale huma images are duplicated thousads of times (!!), causig the previous model somewhat likely to geerate grayscale huma images; (2) some images has low quality, very blurry, or sigificat JPEG artifacts; (3) a small group of images has wrog paired prompts caused by a mistake i our data processig scripts. The ew model fixed all problems of the traiig dataset ad should be more reasoable i may cases.
  2. We fid out that users sometimes like to draw very thick scribbles. Because of that, we used more aggressive radom morphological trasforms to sythesize scribbles. This model should work well eve whe the scribbles are relatively thick (the maximum width of traiig data is 24-pixel-width scribble i a 512 cavas, but it seems to work well eve for a bit wider scribbles; the miimum width is 1 pixel).
  3. Resumed from Scribble 1.0, cotiued with 200 GPU hours of A100 80G.

CotrolNet 1.1 Soft Edge

Cotrol Stable Diffusio with Soft Edges.

Model file: cotrolv11psd15_softedge.pth

Cofig file: cotrolv11psd15_softedge.yaml

Traiig data: SoftEdgePIDI, SoftEdgePIDIsafe, SoftEdgeHED, SoftEdgeHEDsafe.

Acceptable Preprocessors: SoftEdgePIDI, SoftEdgePIDIsafe, SoftEdgeHED, SoftEdgeHEDsafe.

This model is sigificatly improved compared to previous model. All users should update as soo as possible.

New i CotrolNet 1.1: ow we added a ew type of soft edge called "SoftEdge_safe". This is motivated by the fact that HED or PIDI teds to hide a corrupted greyscale versio of the origial image iside the soft estimatio, ad such hidde patters ca distract CotrolNet, leadig to bad results. The solutio is to use a pre-processig to quatize the edge maps ito several levels so that the hidde patters ca be completely removed. The implemetatio is i the 78-th lie of aotator/util.py.

The perforamce ca be roughly oted as:

Robustess: SoftEdgePIDIsafe > SoftEdgeHEDsafe >> SoftEdgePIDI > SoftEdgeHED

Maximum result quality: SoftEdgeHED > SoftEdgePIDI > SoftEdgeHEDsafe > SoftEdgePIDIsafe

Cosiderig the trade-off, we recommed to use SoftEdge_PIDI by default. I most cases it works very well.

pytho gradio_softedge.py

No-cherry-picked batch test with radom seed 12345 ("a hadsome ma"):

img

Improvemets i Soft Edge 1.1:

  1. Soft Edge 1.1 was called HED 1.0 i previous CotrolNet.
  2. The traiig dataset of previous cet 1.0 has several problems icludig (1) a small group of greyscale huma images are duplicated thousads of times (!!), causig the previous model somewhat likely to geerate grayscale huma images; (2) some images has low quality, very blurry, or sigificat JPEG artifacts; (3) a small group of images has wrog paired prompts caused by a mistake i our data processig scripts. The ew model fixed all problems of the traiig dataset ad should be more reasoable i may cases.
  3. The Soft Edge 1.1 is sigificatly (i ealy 100\% cases) better tha HED 1.0. This is maily because HED or PIDI estimator ted to hide a corrupted greyscale versio of origial image iside the soft edge map ad the previous model HED 1.0 is over-fitted to restore that hidde corrupted image rather tha perform boudary-aware diffusio. The traiig of Soft Edge 1.1 used 75\% "safe" filterig to remove such hidde corrupted greyscale images isider cotrol maps. This makes the Soft Edge 1.1 very robust. I out test, Soft Edge 1.1 is as usable as the depth model ad has potetial to be more frequetly used.

CotrolNet 1.1 Segmetatio

Cotrol Stable Diffusio with Sematic Segmetatio.

Model file: cotrolv11psd15_seg.pth

Cofig file: cotrolv11psd15_seg.yaml

Traiig data: COCO + ADE20K.

Acceptable Preprocessors: SegOFADE20K (Oeformer ADE20K), SegOFCOCO (Oeformer COCO), Seg_UFADE20K (Uiformer ADE20K), or maually created masks.

Now the model ca receive both type of ADE20K or COCO aotatios. We fid that recogizig the segmetatio protocol is trivial for the CotrolNet ecoder ad traiig the model of multiple segmetatio protocols lead to better performace.

pytho gradio_seg.py

No-cherry-picked batch test with radom seed 12345 (ADE20k protocol, "house"):

img

No-cherry-picked batch test with radom seed 12345 (COCO protocol, "house"):

img

Improvemets i Segmetatio 1.1:

  1. COCO protocol is supported. The previous Segmetatio 1.0 supports about 150 colors, but Segmetatio 1.1 supports aother 182 colors from coco.
  2. Resumed from Segmetatio 1.0. All previous iputs should still work.

CotrolNet 1.1 Opepose

Cotrol Stable Diffusio with Opepose.

Model file: cotrolv11psd15_opepose.pth

Cofig file: cotrolv11psd15_opepose.yaml

The model is traied ad ca accept the followig combiatios:

  • Opepose body
  • Opepose had
  • Opepose face
  • Opepose body + Opepose had
  • Opepose body + Opepose face
  • Opepose had + Opepose face
  • Opepose body + Opepose had + Opepose face

However, providig all those combiatios is too complicated. We recommed to provide the users with oly two choices:

  • "Opepose" = Opepose body
  • "Opepose Full" = Opepose body + Opepose had + Opepose face

You ca try with the demo:

pytho gradio_opepose.py

No-cherry-picked batch test with radom seed 12345 ("ma i suit"):

img

No-cherry-picked batch test with radom seed 12345 (multiple people i the wild, "hadsome boys i the party"):

img

Improvemets i Opepose 1.1:

  1. The improvemet of this model is maily based o our improved implemetatio of OpePose. We carefully reviewed the differece betwee the pytorch OpePose ad CMU's c++ opepose. Now the processor should be more accurate, especially for hads. The improvemet of processor leads to the improvemet of Opepose 1.1.
  2. More iputs are supported (had ad face).
  3. The traiig dataset of previous cet 1.0 has several problems icludig (1) a small group of greyscale huma images are duplicated thousads of times (!!), causig the previous model somewhat likely to geerate grayscale huma images; (2) some images has low quality, very blurry, or sigificat JPEG artifacts; (3) a small group of images has wrog paired prompts caused by a mistake i our data processig scripts. The ew model fixed all problems of the traiig dataset ad should be more reasoable i may cases.

CotrolNet 1.1 Lieart

Cotrol Stable Diffusio with Liearts.

Model file: cotrolv11psd15_lieart.pth

Cofig file: cotrolv11psd15_lieart.yaml

This model is traied o awacke1/Image-to-Lie-Drawigs. The preprocessor ca geerate detailed or coarse liearts from images (Lieart ad Lieart_Coarse). The model is traied with sufficiet data augmetatio ad ca receive maually draw liearts.

pytho gradio_lieart.py

No-cherry-picked batch test with radom seed 12345 (detailed lieart extractor, "bag"):

img

No-cherry-picked batch test with radom seed 12345 (coarse lieart extractor, "Michael Jackso's cocert"):

img

No-cherry-picked batch test with radom seed 12345 (use maually draw liearts, "wolf"):

img

CotrolNet 1.1 Aime Lieart

Cotrol Stable Diffusio with Aime Liearts.

Model file: cotrolv11psd15s2lieartaime.pth

Cofig file: cotrolv11psd15s2lieartaime.yaml

Traiig data ad implemetatio details: (descriptio removed).

This model ca take real aime lie drawigs or extracted lie drawigs as iputs.

Some importat otice:

  1. You eed a file "aythig-v3-full.safetesors" to ru the demo. We will ot provide the file. Please fid that file o the Iteret o your ow.
  2. This model is traied with 3x toke legth ad clip skip 2.
  3. This is a log prompt model. Uless you use LoRAs, results are better with log prompts.
  4. This model does ot support Guess Mode.

Demo:

pytho gradio_lieart_aime.py

No-cherry-picked batch test with radom seed 12345 ("1girl, i classroom, skirt, uiform, red hair, bag, gree eyes"):

img

No-cherry-picked batch test with radom seed 12345 ("1girl, saber, at ight, sword, gree eyes, golde hair, stockig"):

img

No-cherry-picked batch test with radom seed 12345 (extracted lie drawig, "1girl, Castle, silver hair, dress, Gemstoe, ciematic lightig, mechaical had, 4k, 8k, extremely detailed, Gothic, gree eye"):

img

CotrolNet 1.1 Shuffle

Cotrol Stable Diffusio with Cotet Shuffle.

Model file: cotrolv11esd15_shuffle.pth

Cofig file: cotrolv11esd15_shuffle.yaml

Demo:

pytho gradio_shuffle.py

The model is traied to reorgaize images. We use a radom flow to shuffle the image ad cotrol Stable Diffusio to recompose the image.

No-cherry-picked batch test with radom seed 12345 ("hog kog"):

img

I the 6 images o the right, the left-top oe is the "shuffled" image. All others are outputs.

I fact, sice the CotrolNet is traied to recompose images, we do ot eve eed to shuffle the iput - sometimes we ca just use the origial image as iput.

I this way, this CotrolNet ca be guided by prompts or other CotrolNets to chage the image style.

Note that this method has othig to do with CLIP visio or some other models.

This is a pure CotrolNet.

No-cherry-picked batch test with radom seed 12345 ("iro ma"):

img

No-cherry-picked batch test with radom seed 12345 ("spider ma"):

img

Importat If You Implemet Your Ow Iferece:

Note that this CotrolNet requires to add a global average poolig " x = torch.mea(x, dim=(2, 3), keepdim=True) " betwee the CotrolNet Ecoder outputs ad SD Uet layers. Ad the CotrolNet must be put oly o the coditioal side of cfg scale. We recommed to use the "globalaveragepoolig" item i the yaml file to cotrol such behaviors.

Note that this CotrolNet Shuffle will be the oe ad oly oe image stylizatio method that we will maitai for the robustess i a log term support. We have tested other CLIP image ecoder, Uclip, image tokeizatio, ad image-based prompts but it seems that those methods do ot work very well with user prompts or additioal/multiple U-Net ijectios. See also the evidece here, here, ad some other related issues.

CotrolNet 1.1 Istruct Pix2Pix

Cotrol Stable Diffusio with Istruct Pix2Pix.

Model file: cotrolv11esd15_ip2p.pth

Cofig file: cotrolv11esd15_ip2p.yaml

Demo:

pytho gradio_ip2p.py

This is a cotrolet traied o the Istruct Pix2Pix dataset.

Differet from official Istruct Pix2Pix, this model is traied with 50\% istructio prompts ad 50\% descriptio prompts. For example, "a cute boy" is a descriptio prompt, while "make the boy cute" is a istructio prompt.

Because this is a CotrolNet, you do ot eed to trouble with origial IP2P's double cfg tuig. Ad, this model ca be applied to ay base model.

Also, it seems that istructios like "make it ito X" works better tha "make Y ito X".

No-cherry-picked batch test with radom seed 12345 ("make it o fire"):

img

No-cherry-picked batch test with radom seed 12345 ("make it witer"):

img

We mark this model as "experimetal" because it sometimes eeds cherry-pickig. For example, here is o-cherry-picked batch test with radom seed 12345 ("make he iro ma"):

img

CotrolNet 1.1 Ipait

Cotrol Stable Diffusio with Ipait.

Model file: cotrolv11psd15_ipait.pth

Cofig file: cotrolv11psd15_ipait.yaml

Demo:

pytho gradio_ipait.py

Some otices:

  1. This ipaitig CotrolNet is traied with 50\% radom masks ad 50\% radom optical flow occlusio masks. This meas the model ca ot oly support the ipaitig applicatio but also work o video optical flow warpig. Perhaps we will provide some example i the future (depedig o our workloads).
  2. This gradio demo does ot iclude post-processig. Ideally, you eed to post-process the latet image i each diffusio iteratio ad post-process the image after vae decodig, so that the umasked area keeps uchaged. However, this is complicated to implemet ad perhaps a better idea is to make it i a1111. I this gradio example, the outputs are just the origial outputs from diffusio, ad the umasked area i your image may chage because of the vae or diffusio process.

Update 2023/May/03: CotrolNet's ipait without chagig umasked areas is implemeted i a1111. It supports arbitrary base models/LoRAa, ad ca work together with arbitrary umber of other CotrolNets.

No-cherry-picked batch test with radom seed 12345 ("a hadsome ma"):

img

CotrolNet 1.1 Tile

Update 2023 April 25: The previously ufiished tile model is fiished ow. The ew ame is "cotrolv11f1esd15tile". The "f1e" meas 1st bug fix ("f1"), experimetal ("e"). The previous "cotrolv11usd15tile" is removed. Please update if your model ame is "v11u".

Cotrol Stable Diffusio with Tiles.

Model file: cotrolv11f1esd15_tile.pth

Cofig file: cotrolv11f1esd15_tile.yaml

Demo:

pytho gradio_tile.py

The model ca be used i may ways. Overall, the model has two behaviors:

  • Igore the details i a image ad geerate ew details.
  • Igore global prompts if local tile sematics ad prompts mismatch, ad guide diffusio with local cotext.

Because the model ca geerate ew details ad igore existig image details, we ca use this model to remove bad details ad add refied details. For example, remove blurrig caused by image resizig.

Below is a example of 8x super resolutio. This is a 64x64 dog image.

p

No-cherry-picked batch test with radom seed 12345 ("dog o grasslad"):

img

Note that this model is ot a super resolutio model. It igores the details i a image ad geerate ew details. This meas you ca use it to fix bad details i a image.

For example, below is a dog image corrupted by Real-ESRGAN. This is a typical example that sometimes super resolutio methds fail to upscale images whe source cotext is too small.

p

No-cherry-picked batch test with radom seed 12345 ("dog o grasslad"):

img

If your image already have good details, you ca still use this model to replace image details. Note that Stable Diffusio's I2I ca achieve similar effects but this model make it much easier for you to maitai the overall structure ad oly chage details eve with deoisig stregth 1.0 .

No-cherry-picked batch test with radom seed 12345 ("Silver Armor"):

img

More ad more people begi to thik about differet methods to diffuse at tiles so that images ca be very big (at 4k or 8k).

The problem is that, i Stable Diffusio, your prompts will always ifluet each tile.

For example, if your prompts are "a beautiful girl" ad you split a image ito 4×4=16 blocks ad do diffusio i each block, the you are will get 16 "beautiful girls" rather tha "a beautiful girl". This is a well-kow problem.

Right ow people's solutio is to use some meaigless prompts like "clear, clear, super clear" to diffuse blocks. But you ca expect that the results will be bad if the deoisig stregth is high. Ad because the prompts are bad, the cotets are pretty radom.

CotrolNet Tile ca solve this problem. For a give tile, it recogizes what is iside the tile ad icrease the ifluece of that recogized sematics, ad it also decreases the ifluece of global prompts if cotets do ot match.

No-cherry-picked batch test with radom seed 12345 ("a hadsome ma"):

img

You ca see that the prompt is "a hadsome ma" but the model does ot pait "a hadsome ma" o that tree leaves. Istead, it recogizes the tree leaves pait accordigly.

I this way, CotrolNet is able to chage the behavior of ay Stable Diffusio model to perform diffusio i tiles.

Aotate Your Ow Data

We provide simple pytho scripts to process images.

See a gradio example here.

功能介绍

ControlNet 1.1 This is the official release of ControlNet 1.1. ControlNet 1.1 has the exactly same a

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论