Summary

The website content provides a detailed guide on creating over-the-shoulder (OTS) cinematic shots using Midjourney's latest V5 model, emphasizing the importance of shot size, depth of field, and cinematic terminology.

Abstract

The article titled "Mastering Midjourney: Cinematic Prompts & Over-The-Shoulder Shots" delves into the technique of crafting over-the-shoulder (OTS) shots in cinematic images using the Midjourney V5 AI model. It explains the significance of OTS shots in visual storytelling and demonstrates how the V5 model's enhanced photorealism simplifies the process compared to its predecessor. The guide walks through setting up a scene with two women in a hotel lobby, using specific prompts and parameters to achieve the desired close-up shot with a shallow depth of field. It highlights the challenges faced when the AI interprets the context of the scene and provides solutions to refine the output. The author emphasizes the importance of placing cinematic terminology in the prefix and experimenting with shot descriptions and emotions to achieve the intended OTS aesthetic. The article concludes by encouraging readers to explore further with the Midjourney tool and offers links to previous parts of the series for those who wish to catch up on the topic.

Opinions

The author believes that the Midjourney V5 model significantly improves the ease of creating OTS shots due to its photorealistic tendencies, though it does not entirely replace the need for accurate cinematic terminology in prompts.
It is suggested that the placement of cinematic terminology in the prompt is crucial for achieving the desired shot in the V5 model, with a preference for including it in the prefix rather than the end.
The author indicates that V5's interpretation of natural language has evolved, but it still effectively responds to comma-separated prompts under the right circumstances.
The article implies that the AI's understanding of context can lead to variations in the output, necessitating precise language and additional descriptors like "shallow depth of field" and "extreme close-up" to guide the image generation process.
The author's approach to problem-solving involves iterative prompt refinement, experimentation with different shot sizes and descriptions, and an understanding of the AI's holistic processing of language.

Mastering Midjourney: Cinematic Prompts & Over-The-Shoulder Shots

Cinematography with Midjourney: How To Create Over-The-Shoulder (OTS) Shots

The over-the-shoulder (OTS) shot is one of the most effective and widely used visual storytelling techniques and a standard tool in any filmmaker’s arsenal.

With Midjourney’s latest V5 model, it’s now much easier to use this type of shot in your cinematic images. Here’s how.

What are OTS shots used for?

They are mainly (but not exclusively) used to depict conversations between two people: The camera is behind one of them and captures the scene from their point of view, with part of their shoulder and head in the out-of-focus foreground and their counterpart in the middle of the frame. The technique is often used sequentially, switching between the OTS perspectives of the two subjects, creating a dynamic interplay that effectively conveys dialogue and reactions.

Mastering Midjourney V5 (Part 3): Cinematography Revisited (I)

Exploring shot size and lenses within a cinematic prompt context

bootcamp.uxdesign.cc

How to create OTS shots with Midjourney

In the days of Midjourney V4, you had a lot more obstacles to overcome to achieve an OTS shot, since the old model didn’t have that strong V5 tendency toward photorealism. So let’s get to it.

At the start, we set the scene and characters.

cinematic shot, two women in a hotel lobby scene --ar 16:9 --seed 4000

(if you are new to Midjourney and wonder about the “ — ar 16:9” and “ — seed 4000” thing, catch up here and here)

With our scene & characters in place, let’s jump into our desired shot size.

For OTS shots we need to get to “close-up”.

If we add this to our prompt, we can immediately see the problem we will be dealing with:

cinematic close-up shot, two women in a hotel lobby scene --ar 16:9 --seed 4000

Images 1, 2, and 4 do not match the conversation scene we are aiming for. Image 3, on the other hand, is almost ready.

This kind of problem is very common when creating AI images with Midjourney: the algorithms work holistically in the sense that they take into account the connotations of words and not just the narrow meaning of a word alone.

In this case, the semantic field of “people in a hotel lobby” rendered the two women in a conversation when we were not in close-up because statistically, that’s what most people look like in lobbies: they talk to each other.

In the close-up, things got different: “Talking to each other” is only one option for the model (image 3), while others could be “standing side by side”, “watching something” etc. We know from our explorations in V4, that we can use the phrase “having a conversation” to accomplish two things at once:

keep the character in the right spot for our OTS shot
as well as facilitate the OTS shot when we go into close-up.

For the sake of completeness, we start over again:

cinematic shot, two women having a conversation, hotel lobby scene 
--ar 16:9 --seed 4000

Now we jump into close-up.

cinematic close-up shot, two women having a conversation, hotel lobby scene 
--ar 16:9 --seed 4000

Et voilá, all four images in the initial grid are already going in the right direction.

One important thing to add: I placed the shot types at the end of a prompt while using Midjourney V4, however, in V5 I try to always include them in the prefix. In V5, putting shot types at the end of the prompt will cause a lot of unnecessary problems.

Look at these examples, where I put “close-up” at the end:

cinematic shot, two women having a conversation, hotel lobby scene, 
close-up shot --ar 16:9 --seed 4000

No close-ups.

This could very well be an effect of Midjourne’s refined understanding of natural language. Something that is often misunderstood, as if Midjourney would only understand natural language now. It still understands comma-separated prompts, and they might even be more useful in some cases, but you have to know when to use which.

Also, V5’s tendency toward photorealism is very helpful, but it doesn’t replace the use of cinematic terminology.

For example, if you don’t use a cinematic prefix at all:

close-up shot, two women having a conversation, hotel lobby scene 
--ar 16:9 --seed 4000

… then the original grid differs even more and we are not even close to our destination.

So, back to the OTS shot.

We were here:

cinematic close-up shot, two women having a conversation, hotel lobby scene 
--ar 16:9 --seed 4000

All four images in the initial grid were already on the right way.

What is missing, though?

If you look at image 4, you might agree that this is closer to our OTS aesthetic than the other three.

Why?

Because it is closer to a “close-up” shot than the rest of the images and
it makes specific use of depth of field.

Let’s examine them one by one. First, we add “shallow depth of field” to stress the effect we’re looking for.

cinematic close-up shot, two women having a conversation, hotel lobby scene, 
shallow depth of field --ar 16:9 --seed 4000

Next, we are exaggerating the close-up shot to stress it’s importance as well. Interestingly, V5 always seems to be one shot size begind (“close up” tends to become a medium shot, and “extreme close up” tends to result in a close up)

cinematic extreme close-up shot, two women having a conversation, 
hotel lobby scene --ar 16:9 --seed 4000

Finally, we can combine these two “enforcements” to get the OTS aesthetic in all images of the initial grid.

cinematic extreme close-up shot, two women having a conversation, 
hotel lobby scene, depth of field --ar 16:9 --seed 4000

From here you can start exploring, e.g. further fine-tune shot descriptions:

cinematic extreme close-up shot, two women having a conversation, 
hotel lobby scene, shallow depth of field --ar 16:9 --seed 4000

Or change the emotions:

What else?

What about situations that are not conversations? Can we also make something like this work more reliably :

cinematic over-the-shoulder shot, a man stealing a bagel in a bakery, 
depth of field --ar 16:9 --seed 4000

(Image 4, a shot over a bagel’s shoulder? 😂 )

We’ll dive into this one in one of the next episodes of Mastering Midjourney V5. If you missed the last ones? No worries, you can catch up here, here and here.

I hope you enjoyed this post. Keep exploring! 🚀

➡️ For more information about AI & Creativity, follow me on Twitter or Medium (use my referral link to get full access to all my articles and those of thousands of other writers).

Join Medium with my referral link - Tristan Wolff

Read every story from Tristan Wolff (and thousands of other writers on Medium). Your membership fee directly supports…

medium.com

➡️ If you like my content, why not leave a “clap” at the end of this article, so more people can see it?