/upscale
best quality, high rating, best quality, high quality, masterpiece, definition lines, detailed face, c.cu, (solo:1.1), (betterwithsalt (artist)) 1girl, anthro, grey fur, (isabelle (animal crossing)), hoodie, pants ((SSBBW)), ((hyper bottomheavy)), fat spilling out of clothes, fat belly, exposed belly wide eyes, ((embarassed)), shy outside, park, standing, ass visible from front, (hyper thighs, blob:1.3), eating, apples, fruit trees
This version of hyperfusion was trained on 3.3 million images over 10 months, and is a v_prediction + zero_snr model based on SD1.5.
This version was trained on SD 1.5, so there is no NovelAI influence in this checkpoint.
More image classifiers trained, and existing classifiers improved (list of classified tags under Training Data section)
Training Notes:
~3.3m images
LR 4e-6
TE_LR 1e-6, droped to 1e-7 (after epoch 10)
batch 8
GA 16
2x3090s so 2x the base batch size. total v_batch = 256
total images seen: 190_000 * 256 = 48_000_000
AdamW-8bit (ADOPT for the last epoch as a test)
scheduler: linear
base model SD1.5
No custom VAE, usually use the original SD1.5 VAE
flip aug
clip skip 2
525 token length (appending captions + tags made this necessary)
bucketing at 768 max 1024
bucket resolution steps 32 for more buckets
trained at 768 for the first 10 epochs, and 1024 for the last 6
tag drop chance 0.15
caption_dropout 0.1
tag shuffling
--min_snr_gamma 3
--ip_noise_gamma 0.02
--zero_terninal_snr
about 10 months training time
Custom training configs:
I have implemented a number of things into Kohys's training code that have been suggested to improve training, and kept the things that seemed to make improvements.
drop out 75% of tags 5% of the time to hopefully improve short tag length results
soft_min_snr instead of min_snr
--no_flip_when_cap_matches: Prevent flipping images when certain tags exists like "sequence, asymmetrical, before and after, text on*, written, speech bubble" etc... This should help with text, and characters with asymmetrical features.
--important_tags: move important tags to the beginning of the list, and sort them separately from the unimportant ones (suggested from NovelAI if I remember correctly).
--tag_implication_dropout: Dropout similar tags to prevent the model from requiring them both to be present when generating. Like "breasts, big breasts" breasts will be dropped out 30-50% of the time. I used the tag implications csv from e621 as a base and added tags as needed. Even with 10%-15% tag dropout, some tag pairs were still being associated too often, this definitely made a difference. I think there were about 5k tags in total on the dropout list.
12% of the dataset is captioned with CogVLM, as well as cleaning up many of the captions with custom scripts that correct common problems.
Tags vs Captions: 70% of the time use tags, ~20% of the time use captions (if they exist), 10% of the time combine tags with captions in different orders.
If I remember more custom changes, ill add them later.