Authors:
(1) Anton Razzhigaev, AIRI and Skoltech;
(2) Arseniy Shakhmatov, Sber AI;
(3) Anastasia Maltseva, Sber AI;
(4) Vladimir Arkhipkin, Sber AI;
(5) Igor Pavlov, Sber AI;
(6) Ilya Ryabov, Sber AI;
(7) Angelina Kuts, Sber AI;
(8) Alexander Panchenko, AIRI and Skoltech;
(9) Andrey Kuznetsov, AIRI and Sber AI;
(10) Denis Dimitrov, AIRI and Sber AI. Editor's Note: This is Part 7 of 8 of a study detailing the development of Kandinsky, the first text-to-image architecture designed using a combination of image prior and latent diffusion. Read the rest below. Table of Links Abstract and Introduction
Related Work
Demo System
Kandinsky Architecture
Experiments
Results
Conclusion & Limitations
Ethical Considerations, Acknowledgements and References 7 Conclusion We presented Kandinsky, a system for various image generation and processing tasks based on a novel latent diffusion model. Our model yielded the SotA results among open-sourced systems. Additionally, we provided an extensive ablation study of an image prior to design choices. Our system is equipped with free-to-use interfaces in the form of Web application and Telegram messenger bot. The pre-trained models are available on Hugging Face, and the source code is released under a permissive license enabling various, including commercial, applications of the developed technology. In future research, our goal is to investigate the potential of the latest image encoders. We plan to explore the development of more efficient UNet architectures for text-to-image tasks and focus on improving the understanding of textual prompts. Additionally, we aim to experiment with generating images at higher resolutions and to investigate new features extending the model: local image editing by a text prompt, attention reweighting, physics-based generation control, etc. The robustness against generating abusive content remains a crucial concern, warranting the exploration of real-time moderation layers or robust classifiers to mitigate undesirable, e.g. toxic or abusive, outputs. 8 Limitations The current system produces images that appear natural, however, additional research can be conducted to (1) enhance the semantic coherence between the input text and the generated image, and (2) to improve the absolute values of FID and image quality based on human evaluations. This paper is available on arxiv under CC BY 4.0 DEED license. Authors: (1) Anton Razzhigaev, AIRI and Skoltech; (2) Arseniy Shakhmatov, Sber AI; (3) Anastasia Maltseva, Sber AI; (4) Vladimir Arkhipkin, Sber AI; (5) Igor Pavlov, Sber AI; (6) Ilya Ryabov, Sber AI; (7) Angelina Kuts, Sber AI; (8) Alexander Panchenko, AIRI and Skoltech; (9) Andrey Kuznetsov, AIRI and Sber AI; (10) Denis Dimitrov, AIRI and Sber AI. Authors: Authors: (1) Anton Razzhigaev, AIRI and Skoltech; (2) Arseniy Shakhmatov, Sber AI; (3) Anastasia Maltseva, Sber AI; (4) Vladimir Arkhipkin, Sber AI; (5) Igor Pavlov, Sber AI; (6) Ilya Ryabov, Sber AI; (7) Angelina Kuts, Sber AI; (8) Alexander Panchenko, AIRI and Skoltech; (9) Andrey Kuznetsov, AIRI and Sber AI; (10) Denis Dimitrov, AIRI and Sber AI. Editor's Note: This is Part 7 of 8 of a study detailing the development of Kandinsky, the first text-to-image architecture designed using a combination of image prior and latent diffusion. Read the rest below. Editor's Note: This is Part 7 of 8 of a study detailing the development of Kandinsky, the first text-to-image architecture designed using a combination of image prior and latent diffusion. Read the rest below. Editor's Note: This is Part 7 of 8 of a study detailing the development of Kandinsky, the first text-to-image architecture designed using a combination of image prior and latent diffusion. Read the rest below. Table of Links Abstract and Introduction Related Work Demo System Kandinsky Architecture Experiments Results Conclusion & Limitations Ethical Considerations, Acknowledgements and References Abstract and Introduction Abstract and Introduction Related Work Related Work Demo System Demo System Kandinsky Architecture Kandinsky Architecture Experiments Experiments Results Results Conclusion & Limitations Conclusion & Limitations Ethical Considerations, Acknowledgements and References Ethical Considerations, Acknowledgements and References 7 Conclusion We presented Kandinsky, a system for various image generation and processing tasks based on a novel latent diffusion model. Our model yielded the SotA results among open-sourced systems. Additionally, we provided an extensive ablation study of an image prior to design choices. Our system is equipped with free-to-use interfaces in the form of Web application and Telegram messenger bot. The pre-trained models are available on Hugging Face, and the source code is released under a permissive license enabling various, including commercial, applications of the developed technology. In future research, our goal is to investigate the potential of the latest image encoders. We plan to explore the development of more efficient UNet architectures for text-to-image tasks and focus on improving the understanding of textual prompts. Additionally, we aim to experiment with generating images at higher resolutions and to investigate new features extending the model: local image editing by a text prompt, attention reweighting, physics-based generation control, etc. The robustness against generating abusive content remains a crucial concern, warranting the exploration of real-time moderation layers or robust classifiers to mitigate undesirable, e.g. toxic or abusive, outputs. 8 Limitations The current system produces images that appear natural, however, additional research can be conducted to (1) enhance the semantic coherence between the input text and the generated image, and (2) to improve the absolute values of FID and image quality based on human evaluations. This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Russian Scientists Unveil Open-Source Image Generator With a Groundbreaking Diffusion Method

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

12 Key Aspects for Assessing the Power of Text-to-Image Models

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

12 Key Aspects for Assessing the Power of Text-to-Image Models

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps