First impressions of DALL-E, generating images from text
I made it off the DALL-E waiting list a few days ago and I’ve been having an enormous amount of fun experimenting with it. Here are some notes on what I’ve learned so far (and a bunch of example images too).
(For those not familiar with it, DALL-E is OpenAI’s advanced text-to-image generator: you feed it a prompt, it generates images. It’s extraordinarily good at it.)
First, a warning: DALL-E only allows you to generate up to 50 images a day. I found this out only when I tried to generate image number 51. So there’s a budget to watch out for.
I’ve usually run out by lunch time!
How to use DALL-E
DALL-E is even simpler to use than GPT-3: you get a text box to type in, and that’s it. There are no advanced settings to tweak.
It does have one other mode: you can upload your own photo, crop it to a square and then erase portions of it and ask DALL-E to fill them in with a prompt. This feature is clearly still in the early stages—I’ve not had great results with it yet.
DALL-E always returns six resulting images, which I believe it has selected as the “best” from hundreds of potential results.
Tips on prompts
DALL-E’s initial label suggests to “Start with a detailed description”. This is very good advice!
The more detail you provide, the more interesting DALL-E gets.
If you type “Pelican”, you’ll get an image that is indistinguishable from what you might get from something like Google Image search. But the more details you ask for, the more interesting and fun the result.
Fun with pelicans
Here’s “A ceramic pelican in a Mexican folk art style with a big cactus growing out of it”:
Some of the most fun results you can have come from providing hints as to a medium or art style you would like. Here’s “A heavy metal album cover where the band members are all pelicans... made of lightning”:
This illustrates a few interesting points. Firstly, DALL-E is hilariously bad at any images involving text. It can make things that look like letters and words but it has no concept of actual writing.
My initial prompt was for “A death metal album cover...”—but DALL-E refused to generate that. It has a filter to prevent people from generating images that go outside its content policy, and the word “death” triggered it.
(I’m confident that the filter can be easily avoided, but I don’t want to have my access revoked so I haven’t spent any time pushing its limits.)
It’s also not a great result—those pelicans are not made of lightning! I tried a tweaked prompt:
“A heavy metal album cover where the band members are all pelicans that are made of lightning”:
Still not made of lightning. One more try:
“pelican made of lightning”:
Let’s try the universal DALL-E cheat code, adding “digital art” to the prompt.
“a pelican made of lightning, digital art”
OK, those look a lot better!
One last try—the earlier prompt but with “digital art” added.
“A heavy metal album cover where the band members are all pelicans that are made of lightning, digital art”:
OK, these are cool. The text is gone—maybe the “digital art” influence over-rode the “album cover” a tiny bit there.
This process is a good example of “prompt engineering”—feeding in altered prompts to try to iterate towards a better result. This is a very deep topic, and I’m confident I’ve only just scratched the surface of it.
Breaking away from album art, here’s “A squadron of pelicans having a tea party in a forest with a raccoon, digital art”. Often when you specify “digital art” it picks some other additional medium: