DALL-E is one of the innovative generative AI platforms blurring the lines between human- and computer-generated creativity. Here’s an overview of DALL-E, how to use it, and what you should know to make it work for you.
Table of contents
- What is DALL-E?
- Who created DALL-E?
- Evolution of DALL-E
- How DALL-E works
- Is DALL-E free?
- How to use DALL-E
- Use cases and applications
- Benefits of DALL-E
- Shortcomings of DALL-E
- Conclusion
What is DALL-E?
DALL-E is a generative AI platform that turns text prompts into images. DALL-E can process natural language, so you don’t need any special coding or image-editing abilities to use it. You can enter prompts that describe your desired image’s subject, style, framing, and other characteristics, and DALL-E will produce a visual representation that matches your description. It can also edit existing images.
The name DALL-E was inspired by a combination of the names of two well-known figures: the Spanish surrealist artist Salvador Dalí and WALL-E, the robot in the 2008 Pixar movie of the same name.
Who created DALL-E?
OpenAI, the same company behind ChatGPT, created DALL-E. OpenAI is an AI research company founded in 2015.
Open AI released DALL-E in January 2021. It released DALL-E 2 in September 2022 and DALL-E 3 in October 2023.
How has DALL-E evolved?
OpenAI announced its first image generation tool in 2020, and DALL-E has evolved from there. OpenAI’s first foray into image generation was called Image GPT. Image GPT provided the first proof that the GPT model could create images.
Then came DALL-E. The first iteration of DALL-E was based on a version of GPT-3—the large language model (LLM) that OpenAI released in 2020—adapted for image generation.
DALL-E creates believable images and accomplishes several tasks, some of which include:
- Modifying several characteristics of an object, such as the color and texture of a sphere
- Understanding framing, such as close-ups and wide angles
- Creating images of the same object from multiple angles
- Understanding geographic information and periods in history
What is DALL-E 2?
The next version, DALL-E 2, generates images with four times higher resolution than images generated by DALL-E. It handles composition and object placement more effectively, making elements like shadows and lighting appear more realistic. DALL-E 2 also introduced two new features for modifying existing images: inpainting and outpainting.
- Inpainting is when you erase a portion of an image and use AI to fill in the empty space with something else. For instance, you can remove a building from the background of a photo and replace it with a tree.
- Outpainting is when you expand the borders of an image with AI. For example, if you have a close-up image of your dog in a park and want to expand it to show the city skyline in the distance, DALL-E 2 does that with outpainting.
What is DALL-E 3?
DALL-E 3 is a significant improvement over its predecessor in several ways. For starters, it’s better at interpreting prompts. Previous versions would skip over words and descriptions. You had to become good at prompt engineering to get the image you wanted. DALL-E 3 understands nuance and context better and can follow more complex prompts. Its responses are more accurate, and its images are more coherent. Ultimately, its output better aligns with what people want.
DALL-E 3 also includes more sophisticated security measures. For example, it prevents explicit, aggressive, or discriminatory images. To prevent people from creating images that infringe on copyrights and violate intellectual property, DALL-E 3 doesn’t generate images that resemble living public figures or that mimic the style of popular artists and brands. DALL-E 3 also allows creators to opt out of having their images used for training future models.
Inclusion with existing AI tools
DALL-E 3 is included natively with ChatGPT and Microsoft Image Creator from Designer (formerly Bing Image Generator).
This means that if you have a premium ChatGPT subscription, you can generate images as part of your conversation with the chatbot. With this capability, you don’t just have to write straightforward prompts. You can ask questions or give directions, and ChatGPT can hand them to DALL-E to generate an image.
For example, you might say, “I just moved to Arizona, and everyone keeps talking about something called a haboob. What does that look like?” ChatGPT can process your question and generate a prompt for DALL-E. DALL-E will then create images of a haboob, which is a dust storm that occurs in dry areas like Arizona.
ChatGPT will also elaborate on your prompts to provide DALL-E with more detail. If you write a prompt that says “Create an image of two cats sitting on a chair, in a vintage photographic style,” ChatGPT may refine your prompt to this: “Create a black-and-white vintage photograph of two cats sitting on a green sofa chair. One cat is a tabby, and the other is gray all over. The two cats are sitting side by side.”
How DALL-E works
At a basic level, DALL-E uses deep learning to understand the relationships between images and text, allowing the model to output new images for a text prompt. The specific generative AI models behind DALL-E are constantly evolving.
DALL-E 1
DALL-E 1 (also called DALL-E) uses a version of GPT-3, OpenAI’s LLM, that was trained to generate images from text descriptions. This model is based on a transformer architecture. Just as ChatGPT generates text by predicting each word one by one, the original version of DALL-E generates images by predicting each pixel.
DALL-E 1 generates many candidate outputs for a single prompt. A second AI system, called CLIP (Contrastive Language-Image Pretraining), is used to select the best one. CLIP, just like DALL-E 1, is trained on a large image and caption dataset. However, the goal of CLIP is to understand how closely a given image and text caption are related.
DALL-E 2
DALL-E 2 generates images using a diffusion model rather than an LLM for improved image quality and accuracy.
This approach trains a model to take noisy images, where pixels have been distorted in a random way, and incrementally remove the noise to reveal a clear image. Then you can give a model a set of pixels plus noise—which represents some underlying image features, such as “a cat in a top hat”—and the model will construct a new image from scratch.
DALL-E 2 uses CLIP to understand the text in a user’s prompt and map it to image features. This information is passed to the diffusion model, allowing it to generate an output that fits the user’s prompt.
DALL-E 3
Little is known about the architectural differences between DALL-E 2 and DALL-E 3. This is because OpenAI has not shared this information publicly. However, DALL-E 3 almost certainly uses a diffusion model, as this is widely accepted as the state-of-the-art technique for image generation.
There is speculation that DALL-E 3 uses more advanced diffusion techniques and may be using an LLM (rather than a smaller model like CLIP) to understand relationships between images and text.
Is DALL-E free to use?
DALL-E is available with a paid ChatGPT subscription, which is offered in several tiers for individuals and businesses.
You can access DALL-E for free with Microsoft Image Creator from Designer (formerly Bing Image Generator). Image Creator is also available through Copilot, which is Microsoft’s chatbot.
Tips for using DALL-E
Here are some tips for getting the best results with DALL-E:
Be descriptive
The more precise your prompt, the better DALL-E’s output will be.
- Provide a clear description of the main subject; for example, “a blue microfiber couch” instead of just “a couch.”
- Explain the setting, such as “on a tropical beach,” “in a 1970s house,” or “inside an elementary school gym.”
- Detail any action, like “the sun is setting,” “a dog is napping,” or “a kite is flying.”
- Describe the image format, such as “photorealistic,” “painting,” or “pencil sketch.”
- Tell DALL-E which style you want; for example, “black and white,” “abstract,” or “art deco.”
- Include camera angle and focal distance, like “aerial view,” “close-up,” or “wide-angle.”
- Provide lighting details, such as “deep shadows,” “flash,” or “backlit.”
- Describe the mood; for example, “romantic,” “gritty,” or “dreamy.”
Be experimental
There’s no textbook or perfect way to use DALL-E. The best way to get the results you want is to take an experimental approach to using it.
- Make minor tweaks to your prompts to see if you get better results. Try using variations of the same words to see if it alters your results.
- Find the right balance of details. If your prompts are too detailed, DALL-E may not know which ones are most important. Play around with the complexity of your prompts to find your sweet spot.
- Brace for mistakes and failures. DALL-E can get offtrack. Take each failed response as a learning opportunity. Finding out what doesn’t work is just as important as finding out what does.
DALL-E use cases and applications
People use DALL-E for many applications in business and personal settings.
Marketing and business communications
- Creating images for blogs, social media posts, and websites
- Designing advertisements, such as fliers and posters
- Designing logos and brand elements
- Creating one-of-a-kind stock photos
- Designing product packaging
Conceptualization
- Designing physical products
- Rendering architectural models
- Ideating other creative projects, such as animation, storyboards, and interior design
- Testing out creative ideas in different styles
Educational content
- Creating visual aids like infographics and diagrams
- Depicting historical events
- Visualizing scientific processes that you can’t see with the naked eye, such as chemical reactions
- Creating images tailored to an individual student’s specific needs, interests, or learning style
Art and design
- Creating custom artwork for your home or party decor
- Designing cover art for books, albums, or movies
- Creating art to sell on products like T-shirts, bookmarks, and prints
- Creating reference images to use as inspiration for other art mediums, like fashion design
- Designing elements, such as background textures, to incorporate into other forms of artwork
Modifying existing images
- Adding more subjects to an image
- Adjusting the background
- Changing the aspect ratio
- Emphasizing certain objects
- Removing an object and replacing it with something else
Benefits of using DALL-E
DALL-E offers numerous advantages, including the ability to choose from multiple responses, use the platform alongside other AI tools, and remove barriers to art and design.
Generates multiple images per prompt
DALL-E generates four images per prompt, so you can choose the one that best suits your preferences. It modifies the prompt slightly for each image and expands on it to add more detail.
For example, if you enter a generic prompt like “A comic-book-style image of a dark alley,” DALL-E will rephrase your prompt and add details like the style of buildings in the scene, the framing of the image, or the predominant colors. You can see DALL-E’s prompt variations by clicking on each image.
Integrates with ChatGPT and Microsoft Copilot
You can access DALL-E through chatbots that you may already be using, such as ChatGPT and Microsoft Copilot. It’s convenient to generate text and images all inside of one tool. Also, since these are chatbots, the images you generate can be part of a longer conversation.
For example, suppose you’ve been using ChatGPT to create an agenda for a baby shower. In that case, you can also use DALL-E to make the images for the invitations. Since it’s all part of one conversation, ChatGPT can incorporate some of the details of your agenda into the invite.
Makes design more accessible
Design software and photography equipment can be expensive and challenging to learn. DALL-E makes image generation more accessible for the average person.
- A small business owner can create custom brand assets, like photos and product images that would have previously been out of reach.
- Hobbyists in areas like woodworking and sculpting can draft visualizations of their concepts without investing in costly software.
- People and organizations from underrepresented groups or with niche hobbies can create images that speak to their interests.
Shortcomings of DALL-E
Despite its capabilities, DALL-E does have some limitations.
Unpredictability
Since DALL-E generates every image from scratch, it can be unpredictable. Suppose you have specific requirements for object placement or brand standards. In that case, DALL-E may not always incorporate those standards in its results.
Also, slightly adjusting your prompt may result in a significantly different output. This is especially challenging when changing an image DALL-E has already created.
Biases
All generative AI deals with biases, and DALL-E is no different. DALL-E is subject to generating responses that reflect biases about race, gender, class, and even certain languages or countries. DALL-E was trained primarily on data from the US, so it often reflects American culture, values, and biases.
Using certain adjectives may lead to stereotypical results. For instance, if the prompt contains words like emotional or sensitive, the output may be associated with a woman. At the same time, words like tough or intellectual may lead to results that feature men.
Cost
DALL-E comes at a cost unless you use Microsoft Image Creator, which may be inconvenient, depending on your preferences.
If you prefer using ChatGPT over Microsoft’s AI platforms, you’ll have to pay to access DALL-E.
What’s next for DALL-E and AI image generation?
You can use DALL-E to fuel creative brainstorming, streamline design processes, or simply have fun. It’s one of the many generative AI platforms that allows you to create in new ways. Because it’s integrated with existing AI platforms like ChatGPT and Microsoft Image Creator, you can create images and generate text all within a single tool.
When using DALL-E, it’s important to be mindful that all generative AI is prone to producing biased responses. Knowing the limitations of DALL-E allows you to find the best ways to use it and get the images you want.
New capabilities, features, and competitors are constantly emerging. Anyone who wants to use generative AI—whether for business, personal, or educational purposes—should keep tabs on the latest developments. We’ll keep covering the significant changes in generative AI, so keep up with the Grammarly blog to stay in the loop.