avatarJennifer Fu

Summary

This article explores the use of OpenAI's DALL·E API in a Next.js project, demonstrating how to generate images from language descriptions, create image variations, and edit existing images.

Abstract

The article begins by introducing DALL·E, a deep learning model developed by OpenAI that generates realistic images and art from natural language descriptions. It then explains how to set up an OpenAI account and configure a Next.js project to work with the DALL·E API. The article covers three main features of DALL·E: generating images from language descriptions, creating image variations based on a given image, and editing existing images for inpainting and outpainting. For each feature, the article provides step-by-step instructions and examples, along with code snippets and screenshots. The article concludes by reflecting on the impact of AI-driven art on the field of art and daily life.

Bullet points

  • DALL·E is a deep learning model developed by OpenAI that generates images and art from natural language descriptions.
  • To use DALL·E in a Next.js project, you need to set up an OpenAI account and configure the project to work with the DALL·E API.
  • DALL·E can generate images from language descriptions, create image variations based on a given image, and edit existing images for inpainting and outpainting.
  • The article provides step-by-step instructions and examples for each of these features, along with code snippets and screenshots.
  • The impact of AI-driven art on the field of art and daily life is discussed.

Exploring OpenAI DALL·E APIs With Next.js

Edit images based on descriptions and more

Image by DALL·E

Introduction

DALL·E is an AI system that can create realistic images and art from natural language descriptions. The software is named after the animated robot Pixar character WALL·E and the Spanish surrealist artist Salvador Dalí. This deep learning model is one of the AI products of OpenAI.

A previous article has described how to set up openAI account, and we continue to use the account to explore DALL·E. We also export OPENAI_API_KEY as an environment variable for the Next.js working environment. Next.js is a React Framework with a built-in client and server, where APIs are invoked on the server side.

We use the following command to set up a Next.js project named next-dalle2:

% yarn create next-app next-dalle2 --typescript
% cd next-dalle2

Execute the command, yarn dev, and we will see the default Next.js UI at http://localhost:3000. It is Next.js 13’s Get Started page.

Image by author

In this article, we will explore DALL·E features and see how they can be used in Next.js. We’ll do the following:

  • Generate images based on language description
  • Create image variations based on a given image
  • Edit an existing image for inpainting and outpainting

Generate Images Based on Language Description

OpenAI provides an intuitive interface to generate images. You can type a detailed description in the input field and click the button, Generate.

For example, we type the prompt, A flying robot in space that is drawn by Vincent van Gogh, and it generates four images.

Image by author

The above images are of professional quality and are made in Vincent van Gogh's style. It is noticeable that each image has a DALL·E signature or watermark in the bottom right corner.

Click on the first image, and examine the enlarged picture below:

Image by author

We can download this image with the DALL·E signature or open the inspect window to get the URL for the same image without the DALL·E signature.

Image by author

The generated images’ private URLs will be available for an hour. Save the images for keep before they expire.

We build DALL·E inside the Next.js project, and it takes five steps to do it:

  1. Install openai in the project.
  2. Modify the get started page, pages/index.tsx.
  3. Update the page styles, styles/Home.module.css.
  4. Configure call handler in api/hello.ts.

1. Install openai in the project

Run the following command to install the openai package:

% yarn add openai

openai becomes part of dependencies in package.json:

"dependencies": {
  "openai": "^3.1.0",
}

2. Modify the get started page, pages/index.tsx

Files in the pages folder are React components. When a file is added to the pages folder, it is automatically available as a route. index.tsx is the home route. It is invoked when a user access /. The default content is the Get Started page, and we modify it to be a page with prompt and images.

Image by author

The UI has an input field to type a new prompt. After the user presses the enter key, the input text is cleared. The prompt is displayed on the page. The response images will be displayed in the Loading… area. Since we will generate ten images each time, there is a button, Click to view the next image >, to rotate images to be viewed.

The following screenshot shows what it looks like after the images are generated:

Image by author

Here is the modified pages/index.tsx:

In the above code, we set two const variables:

  • IMAGE_COUNT (line 4): The number of images to generate. It must be between 1 and 10, and the default value is 1. It is set to 10 to ensure the selection pool is big enough.
  • IMAGE_SIZE (line 5): The size of the generated images. It must be one of 256, 512, or 1024. It is set to 1024. The value is used to compose the image size as `${IMAGE_SIZE}x${IMAGE_SIZE}`. The default value is '1024x1024'.

There are four React states created:

  • value (line 8): It is the value in the input field applied at line 48. value is updated by handleInput (lines 13–16).
  • prompt (line 9): It is the user prompt to generate images, which is displayed by line 49. prompt is set by handleKeyDown (lines 18–39) when the input field has a keydown event with the key, 'Enter' (line 20). The API route call is handled at lines 24–34, where the endpoint is '/api/hello' (line 24), and the request body defines prompt (line 30), n (number of images, line 31), and size (line 32).
  • imageIndex (line 10): It is the index to choose which generated image to be displayed, among the total ten images.
  • images (line 11): It is the generated ten images. One of the images is displayed in the iframe component (lines 52–56). At line 50, clicking the button calls handleNextImage (lines 41–43) to increase imageIndex. When there are no images loaded, 'Loading...' is displayed (line 51).

3. Update the page styles, styles/Home.module.css

To layout pages/index.tsx nicely, we update styles/Home.module.css:

  • At lines 1–7, the main class is styled as a flex layout by the column direction, with some padding.
  • At lines 9–11, .main iframe is styled with no border.
  • At lines 13–15, .main div is styled with some padding.
  • At lines 17–19, .main input is set to 80% of the width.
  • At lines 21–23, .main button adds some margin at the bottom.

4. Configure call handler in api/hello.ts

API routes provide a solution to build APIs. Files inside the pages/api folder are mapped to /api/*, and each of them is treated as an API endpoint. Since it is a server-side bundle, it is secure to invoke calls with OPENAI_API_KEY.

Here is the modified api/hello.ts:

  • At lines 4–6, configuration is created with apiKey that is set to the environment variable, OPENAI_API_KEY.
  • At line 7, openai is instantiated with configuration.
  • At lines 9–11, the type Data is defined.
  • At lines 13–24, the API handler is defined, which takes a request object and builds a response object. The response object is in json format with the status code 200 (line 23).
  • The response data comes from result (line 18), which is the response from openai.createImage that creates n images (line 20) of size (line 21), with specific prompt (line 19).

Execute yarn dev. Type the prompt, A flying robot in space that is drawn by Salvador Dalí, and it generates ten images. Clicking the button, Click to view the next image >, we view each of the generated images.

Image by author

Create Image Variations Based on a Given Image

DALL·E can create a variation based on a given image. With the following image of robot in van gogh style, it has three options:

  • Edit image
  • Generate variations
  • Report an issue
Image by author

Execute the command, Generate variations, and four new images are generated, along with the original image.

Image by author

Creating image variation can be achieved by the Next.js project as well. Instead of typing a prompt, it requires the original image file from the local file system.

Image by author

Here is the modified pages/index.tsx:

  • At line 9, we replace prompt with fileName.
  • At line 30, the request body takes fileName.
  • At line 49, fileName is displayed.

Here is the modified api/hello.ts:

  • The response data comes from result (line 19), which is the response from openai.createImageVariation that creates n images (line 21) of size (line 22), from a specific File that is read from a local file (line 20).

The following is the file image, p1.png, which is generated from the prompt, A flying robot in space that is drawn by Pablo Picasso.

Image by DALL·E

Execute yarn dev. Type the file name, p1.png, and press enter. It generates ten images. Clicking the button, Click to view the next image >, we view each of the generated images.

Image by author

Edit an Existing Image for Inpainting and Outpainting

When DALL·E creates variations of a given image, a user has zero control of the outcome. The editing feature gives a user some control, using a prompt and specifying areas to fill. It is also called inpainting and outpainting.

  • Inpainting: It tweaks the original image to create controlled variations, such as changing an outfit of a model.
  • Outpainting: It extends the original image to create large-scale images in any aspect ratio, such as creating a garden from a tree.

Both inpainting and outpainting take into account the image’s existing visual elements — including shadows, reflections, and textures — to maintain the context of the original image.

Here is the image editing screen, and we put two images in the editing area. This is inpainting as the changes are in the image boundary.

Image by author

It generates four images based on the prompt, A futuras fish swims to a bowl of ice cream.

Image by author

What do you think of the output images?

Here is the author’s choice:

Image by DALL·E

The following is outpainting, as the image boundary is extended:

Image by author

It generates four images based on the prompt, a big fish in a small pond fantasy.

Image by author

Here is the author’s choice:

Image by DALL·E

Strictly speaking, editing an image is inpainting as it only supports editing a square image. However, outpainting can be achieved by making the image boundary large and cropping the final result to any size.

Let’s try to edit an image in the Next.js project. It requires a local file name and a prompt.

Image by author

Here is the modified pages/index.tsx:

  • At line 6, MASK_FILE_PREFIX is created to build the mask file name. The mask file is an additional image whose fully transparent areas indicate where image should be edited. It should have the same dimensions as the original image.
  • At line 9, value is used for the file name input.
  • At line 10, value2 is used for the prompt input.
  • At line 11, fileName is for the original file name and the associated mask file name.
  • At line 12, prompt is used for the prompt.
  • At lines 38–44, the request body takes fileName, maskFileName, prompt, n, and size.
  • At lines 59–64, the required fileName, maskFileName, and prompt are taken from the input fields and displayed on the screen.

Here is the modified api/hello.ts:

  • The response data comes from result (line 19), which is the response from openai.createImageEdit that creates n images (line 23) of size (line 24), from the original local file (line 20) and the local mask file (line 21).

The following is the original file image, lake.png, which is a photo taken at Stevens Creek Reservoir.

Image by author

The following is the mask file image, mask-lake.png, built with the preview tool on Mac.

Image by author

Execute yarn dev. Type the fileName, lake.png. Type the prompt, A dragon rises from a lake, and press enter. It generates ten images. Clicking the button, Click to view the next image >, we view each of the generated images.

Image by author

There are two images of the author’s choices.

Choice 1 is a realistic dragon.

Image by DALL·E

Choice 2 is an imaginary dragon.

Image by DALL·E

Conclusion

We have shown DALL·E’s capability to generate images based on language description, to create image variations based on a given image, and to edit an existing image for inpainting and outpainting. These operations can be achieved using OpenAI’s online UI or programmed in a web application like Next.js.

As the invention of the camera changed art history, AI-driven art is reshaping the field of art and our daily life. Are you thrilled or disturbed?

Regardless, OpenAI DALL·E is a powerful tool, along with Stable Diffusion, GPT-3, ChatGPT, Point·E, and Whisper.

Thanks for reading.

Want to Connect?

If you are interested, check out my directory of web development articles.
Programming
React
Nextjs
Web Development
JavaScript
Recommended from ReadMedium