avatarTristan Wolff

Summary

This context provides a guide to creating cinematic images using Midjourney, focusing on the use of prompts and the inclusion of various filmmaking artifacts.

Abstract

The context discusses the creation of cinematic images using Midjourney, an AI tool that can generate images based on prompts. It explains that to make images appear cinematic, they should contain artifacts of real-world filmmaking such as camera angles, shot types, lighting effects, and character placement. The guide breaks down the anatomy of a cinematic prompt into four components: prefix or trigger, scene description, style and shot descriptions, and parameters. It also provides examples of how to craft prompts to achieve specific cinematic effects, and discusses the importance of considering the overlap between scene and style descriptions.

Bullet points

  • The guide focuses on creating cinematic images using Midjourney, an AI tool.
  • Cinematic images should contain artifacts of real-world filmmaking.
  • The anatomy of a cinematic prompt includes four components: prefix or trigger, scene description, style and shot descriptions, and parameters.
  • Prefixes or triggers provide context for the medium in which the rendering should be appropriate.
  • Scene descriptions include character placement, action, composition, set design, and props.
  • Style and shot descriptions include camera and shot type, lighting, post-production, and film material/quality.
  • Parameters define aspect ratio, quality, and chaos.
  • The guide provides examples of how to craft prompts to achieve specific cinematic effects.
  • It is important to consider the overlap between scene and style descriptions.

Consistent Cinematography with Midjourney: Cinematic Prompt Cheatsheet

A Guide To Creating Cinematic Images With Midjourney

Image by the author & Midjourney

Cinematic Prompt Anatomy

There are many ways to create images that have a “cinematic” look. Generally, we perceive images as “cinematic” if they contain artifacts of real-world filmmaking, such as certain camera angles and shot types, cinematic scene composition, lighting effects or character placement, etc.

This may sound trivial at first, but this general observation has some interesting implications when it comes to developing “cinematic prompts” with Midjourney. Basically, in order to make our images “cinematic” we want to show as many artifacts of traditional filmmaking as possible.

Here is a schematic drawing showing how a single prompt can contain all the necessary cinematic artifacts and how they relate to the corresponding filmmaking crafts.

Image by the author
  • A Prefix or Trigger is used to add context to Midjourney’s holistic prompt interpretations
  • A scene description includes (explicitly or implicitly) character placement, action, composition, mise-en-scène/set design, props, etc.
  • Style & shot descriptions include (explicitly or implicitly) camera & shot type, lighting, post-production, film material/quality, etc
  • By using parameters, aspect ratio, quality, and chaos can be defined.

It’s important to remember that these categories can’t be sharply differentiated, since they semantically overlap. For example, when using the same prompt (i.e. same prefix + same scene description), the usage of the word

  • “night” in the scene description implies specific lighting effects (e.g. “Backlighting”),
  • replacing it with “having a conversation” implies specific shot types (e.g. “Two Shot”, “Over-the-shoulder shot”), and
  • changing it to “futuristic room” implies specific set designs and color grading (“sterile” or “clinical” indoor set design, resemblance to genre reference works, e.g. “Space Odyssey”):
same prompt, only one expression changed in scene description (“night”, “having a conversation”, “futuristic room”)

Something to keep in mind when crafting scenes with Midjourney:

  • scene and style descriptions always overlap (we have to deal with interference when designing prompts)
  • the shorter the prompt, the more control; long prompts tend to entail redundant expressions or unindented overlaps
  • the longer the prompt, the less control but the more explorative potential
  • redundancy is necessary in some cases, e.g. “enforcement” (see: style/shot descriptions)

Based on my personal experience as a screenwriter using visual storytelling to create scenes based on a given narrative, there’s this rule of thumb: The more accurately you can describe a scene, the shorter the prompt gets — the shorter the prompt, the easier you can control it and make changes.

Let’s take a look at the individual elements of this cheat sheet for prompt design:

Prefixes or Triggers

These are pretty self-explanatory. We use them to give Midjourney a context for which medium we expect the rendering to be appropriate (in our case a cinema screen instead of canvas for a renaissance painting). The following expressions work well as prefixes/triggers to tell Midjourney to render in a cinematic context:

film still
cinematic still
cinematic shot
movie shot
footage from movie
movie footage
movie scene
cinematic movie scene
footage from [INSERT genre description] movie

etc.

They are interchangeable most of the time, as we can see here (same prompt, iterating prefixes):

“movie footage” vs “movie still”
“film still” vs “cinematic still”
“cinematic shot” vs “film footage”
“scene from movie” vs “award-winning movie scene”

If we modify Prefixes/Triggers further with …

black and white movie scene
sci-fi movie footage
etc.
“black and white movie scene” vs “sci-fi movie footage”

… we can achieve explicit (“black and white”) or implicit (“sci-fi” changes the color grading, not the costumes or setting) prompt effects.

Scene Descriptions

The effect of explicit/implicit expressions greatly influences scene descriptions. For example, compare these two pictures:

“sitting at a table in a kitchen” vs “domestic scene”

In the images above, the same prompt is used and the only change is the description of the scene (one explicitly uses “samurai sitting at a table in a kitchen”, and the other produces similar content implicitly with “domestic scene of a samurai”). Note also that “samurai” has cultural and historic implications for the depiction of the set, characters, and props.

Here’s another example: the same prompt (“film still” as prefix, “samurai” as scene description) but with a single word added to the scene descriptions causes overlapping with shot types, mise-en-scène, and lighting:

So, there are two things to be aware of:

  1. The following elements are always included in a scene description (either explicitly or implicitly):
  • character description
  • description of action
  • set description/set design
  • scene composition
  • props

2. The scene descriptions elements generally overlap/interfer with the following shot/style elements:

  • lighting
  • location (interior/exterior)
  • color grading
  • shot type (camera angle, lens & shot size)

Are you a content creator, AI artist, or storyteller? Sign up here for my FREE WEEKLY newsletter “Tales Of Tomorrow” and learn everything about the latest tools & workflows for AI-powered content creation.

Style descriptions

In addition to scene descriptions, style descriptions are used to

  • enforce scene description (e.g. using “wide-angle” in addition to “mountains in the background” or using “volumetric lighting” in addition to a “nightly park scene” or using “over-the-shoulder shot” in addition to “having a conversation”, etc.),
  • alter scene description (e.g. using “style by sci-fi movie” to change lighting and color-grading, or using “atmosphere”, “vibe” or “symmetrical” to alter scene composition)
  • set color-grading, lighting, composition, shot type, camera angle, lens & shot size (I wrote about this here, here and here)

All of these can be prompted

  • directly, e.g. “8k, wide angle, black and white film”,
  • by reference, e.g. “style by 80s sci-fi movie”, “directed by stanley kubrick”
  • or as a mixture of both “style by 70s thriller movie directed by stanley kubrick, close-up, shallow depth of field”.

Examples

image by the author

Here are some examples of how to use the above cheatsheet to create scenes and keep track of interferences from explicit/implicit scene/style descriptions (again: if you are exploring, you would probably not even want to be that precise with a prompt, but it can be very useful when fine-tuning scenes or working on the accurate depiction of elements from a given story).

Example 1: Switching shot types via style description

This rendering uses “samurai on a donkey” as scene description and “depth of field, style by 60s adventure movies, technicolor” as style description.

Adding “close-up” to the style description changes the shot size as intended since we have no interference from the scene description.

Example 2: Style (in)consistency

Generally, results are better, when style consistency is preserved. For example, adding the expression “8k” to the above example would not make sense since it destroys the 60s lighting and color-grading interference.

Instead, you would use “8k” in those style descriptions that prompt for a different visual style, for example as an addition to “depth of field, cinematic color grading, atmosphere” which already triggers a more modern, mainstream cinematography (e.g. “teal and orange” color-grading, etc.). In that case, “8k” fits the overall style description very well.

same prompt, without and with “8k” in style description “depth of field, cinematic color grading, atmosphere”

Example 3: Switching shot size via scene description & positioning

In the following example, the same prompt is used for all three images:

  • Prefix is “movie still”,
  • style description is “style by 2010 sci-fi, dramatic lighting”,
  • scene description: “samurai with a sword on a spaceship”)

We now want to imply shot size via scene description & positioning without any additional shot reference in the style description:

without & with “eyes” added at the end/beginning of the prompt
  • The first image has no additions, resulting in some wide-angle shots and a medium shot.
  • Image number two has “eyes” added at the end of the scene description, resulting in two medium shots and two close-ups.
  • Image number three has “eyes” added at the very beginning of the prompt (before the prefix), resulting in extreme close-ups.

Example 4: Enforcement by combining style & scene description

Here we use “samurai having a conversation” as scene description, which gives us the expected “two shot” / “over the shoulder” tendency.

If we want to change it to a wide angle shot, adding “wide angle” in the style description doesn’t really help:

Also, enforcing “wide angle” with repetition in both scene and style description only causes minor changes in shot size, because the expression “having a conversation” still maintains the “two shot” interference.

But we can enforce “wide angle” by adding an explicit background description (“mountains in the background”) in the scene description:

For more information about cinematography with Midjourney, follow me on Twitter or Medium (use my referral link to get full access to all my articles and those of thousands of other writers).

If you like my content, why not leave a “clap” at the end of this article, so more people can see it?

Also, feel free to check out the aforementioned three-part tutorial “Cinematography with Midjourney”:

Artificial Intelligence
Design
Creativity
Movies
Midjourney
Recommended from ReadMedium