Big news for the Z-Image community—Z-Image Omni Base is officially on the horizon!
The Tongyi-MAI team (Alibaba) is pivoting from a separate generation/editing approach to an "Omni" pre-training paradigm. This means it handles both Text-to-Image (T2I) and Image-to-Image (I2I) editing seamlessly in a single stream, using a 6B scalable Single-stream Diffusion Transformer (S3-DiT).
Why this matters:
Unified Workflow: No more switching between specialized models for generation and editing. Strategic Upgrade: It breaks the barriers between T2I and I2I, making LoRA adapters more versatile across tasks. Community Proof: Recent commits in DiffSynth-Studio and official GitHub updates (now marked as "to be released") show everything is ready for the weight drop. Bilingual Base: Native support for both English and Chinese prompts. We’ve put together a deep dive into the architecture, naming strategy, and the latest evidence from GitHub/ModelScope:
Full Article: https://z-image.me/en/blog/z-image-omni-base-coming-soon-en
Excited to see how this performs on consumer GPUs (6.5B parameters is the sweet spot!). What are your thoughts on the "Omni" vs separate model approach?