The Ultimate Guide - Illustrious XL 2.0 created with SeaArt AI

Welcome!

This guide provides a detailed exploration of crafting prompts for Illustrious XL 2.0, an advanced AI model for illustration and anime-style image generation, based on research and community insights. The model, developed by OnomaAI Research, is built on Stable Diffusion XL and fine-tuned on datasets like Danbooru2023, making it particularly suited for artistic and character-focused outputs. While specific documentation for version 2.0 is limited, the principles derived from earlier versions (e.g., v0.1, v1.0) and general Stable Diffusion practices are likely applicable, given its evolutionary nature.

Model Overview and Context

Illustrious XL 2.0 is designed for high-quality image generation, with a focus on illustrations and animations. It supports both tag-based and natural language prompts, leveraging multi-level captions for enhanced text-image correspondence. The model's training includes a resolution of 1536 x 1536, with parameters like a batch size of 512, learning rates of 4e-5 for U-Net and 3e-6 for the text encoder, and 2 epochs, as detailed in the research paper Illustrious: an Open Advanced Illustration Model. This suggests it can handle complex prompts with hierarchical descriptions, though exact v2.0 specifics are inferred from v0.1 and v1.0 guidelines.

Step-by-Step Prompt Crafting Guide

To craft the perfect prompt, follow these steps, designed for simplicity and accessibility, even for children:

Define the Subject: Start by deciding the main focus, such as a character, animal, or object. Specify the count, e.g., "1girl", "2boys", and include character names if applicable, like "hatsune miku" or "naruto uzumaki". This aligns with the model’s capability for character separation, as noted in the arXiv paper.

Describe Details: Add details about appearance, actions, and attire using simple tags or natural language. Examples include "smiling", "wearing a blue dress", "holding a sword", or "standing". The Civitai article suggests using a hybrid approach, combining tags (e.g., "long hair, blue eyes") with descriptive sentences for clarity.

Incorporate the Background: Specify the setting, such as "indoors, bedroom", "outdoors, city", or "fantasy forest, trees". The article provides detailed background structures, such as:

Add Quality and Style Tags: Conclude with quality tags like "masterpiece", "best quality", "highres", and optionally "absurdres", "newest" for enhanced output. For realism, add "ambient occlusion, raytracing". The Civitai article recommends helper tags at the start or end, like "(masterwork, portrait, princess midna, fan no hitori, award-winning, masterpiece, best quality, hyper-detailed, 8k uhd::1.4)", to boost quality.

Include Negative Prompts: Negative prompts are essential to avoid unwanted features. The Hugging Face page and Civitai article provide examples, with the following identified as optimal

Structure for Multiple Characters: For multiple subjects, include the count and positions, e.g., "2girls, holding hands, side by side". The arXiv paper shows examples like "2girls, otonose kanade, hatsune miku, side-by-side, masterpiece", highlighting the model's ability to handle multi-character separation.

Specify Angles and Lighting: Add tags after quality tags for angles, like "from above, close-up, portrait, POV", and for lighting, like "Cinematic Light, Backlighting, Rim lighting". The Civitai article suggests placing lighting tags at the beginning or end for emphasis.

Optimize Settings: Use CFG between 4.5-7.5 (sweet spot 5.5), Euler A sampler, and 20+ steps (24 recommended), which also mentions DPM-based schedulers for aesthetic setups followed by img2img with Euler discrete.

Examples of Prompts

To illustrate, here are examples tailored for Illustrious XL 2.0, based on community insights:

Simple Girl: 1girl, long hair, blue eyes, smiling, wearing a red dress, outdoors, park, masterpiece, best quality, highres, absurdres, newest

Specific Character: 1girl, asuka langley, evangelion, pilot suit, holding a gun, action pose, futuristic city background, cyberpunk, masterpiece, best quality, highres, absurdres, newest

Multiple Characters: 2girls, holding hands, smiling, wearing school uniforms, standing in front of a school building, sunny day, masterpiece, best quality, highres, absurdres, newest

Fantasy Scene: 1boy, adventurer, holding a sword, standing in a fantasy forest, trees, river, mountains in the background, epic, masterpiece, best quality, highres, absurdres, newest

These examples demonstrate the hybrid approach, combining tags and natural language, aligning with the model's capabilities.

The Perfect Negative Prompt

The best negative prompt to avoid unwanted features is:

Long version (most thorough): "lowres, (bad), bad anatomy, bad hands, extra digits, multiple views, fewer, extra, missing, text, error, worst quality, jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest, early, chromatic aberration, signature, artistic error, username, scan"

Short version (easier): "lowres, worst quality, bad quality, bad anatomy, sketch, jpeg artifacts, signature, watermark, artist name, old, oldest"

Use the long version for more control, especially for detailed images.

Technical Details and Best Practices

The model's training on Danbooru2023, as noted in Illustrious Xl Early Release V0 · Models · Dataloop, suggests it excels with tag-based prompts, but the flexibility of natural language, especially for backgrounds and lighting. For composition, avoid conflicting tags like "close-up, upside-down, cowboy shot", and use recommended tags like "upper body, portrait, full body".

Danbooru tag groups, provide additional resources:

Composition: image composition
Focus: focus tags
Backgrounds: backgrounds
Lighting: lighting
Colors: colors

These resources can enhance prompt detail, especially for complex scenes.

Table: Recommended Settings and Tags

Aspect	Details
Recommended Sampler	Euler A, or DPM-based for aesthetic setups followed by img2img with Euler
CFG Range	4.5–7.5, sweet spot 5.5
Steps	20+ (24 recommended)
Quality Tags	masterpiece, best quality, highres, absurdres, newest
Composition Tags to Avoid	close-up, upside-down, cowboy shot (can conflict)
Recommended Composition Tags	upper body, cowboy shot, portrait, full body (use case dependent)
Negative Prompt (Short)	lowres, worst quality, bad quality, bad anatomy, sketch, jpeg artifacts, etc.
Negative Prompt (Long)	lowres, (bad), bad anatomy, bad hands, extra digits, multiple views, etc.

Conclusion

Crafting the perfect prompt for Illustrious XL 2.0 involves a structured approach, combining subject details, backgrounds, and quality tags, with negative prompts to refine outputs. The provided examples and settings ensure accessibility, while community resources and technical details offer depth for advanced users. The long negative prompt is recommended for comprehensive control, aligning with the model's capabilities for high-quality, detailed illustrations.

Cya!

Comment below if you liked the step-by-step guide!