Midjourney Explorations: #9 — Perspectives and Composition
Master generative AI art with tips on perspective and photography. Achieve creative precision and adaptability in your art journey.
Spatial Perspectives and Composition
In a previous article, Midjourney Explorations: #6 — Art Media, Photos!, I delved into the world of cameras, film, and aperture. However, I didn’t go into great depth on how to frame the content of your image or why perspective matters. Given the significance and complexity of the topic, it warrants its own focused discussion. This article aims to fill that gap by providing a comprehensive look at how perspective terms can offer more consistent and accurate results in generative AI art, as compared to strictly photographic terms.
Photographic Terms
As I’ve found in my Midjourney explorations, sticking solely to photographic terms can yield hit-or-miss outcomes, underscoring the importance of a broader viewpoint. When using photographic terminology like “Depth of Field,” “Fish-Eye Lens,” or “Bird’s-Eye View,” the results can be variable in Midjourney, especially when these terms are applied in non-photographic art. These terms are often specific to the medium of photography, and when applied to other visual mediums, they may lead to inconsistent or even perplexing outcomes. That said, if they are working for you, then by all means use them, just do not get stuck to them when other options are available.
Why Perspective Terms Work Better
Spatial perspectives serve as the backbone of visual art, enabling the representation of three-dimensional space on a two-dimensional medium. In Midjourney, these perspectives can drastically improve the resulting images by adding dimensions, depth, and a complex layering of elements in the visual field. They can also guide the AI in positioning objects and subjects to evoke specific emotional or thematic responses, enriching the viewer’s experience.
Basic Elements of Composition
Foreground: The area closest to the viewer.
Middleground: Between foreground and background.
Background: The area furthest from the viewer.
The depths can be used separately or together and do a fairly good job of placing things in the depth you may have in mind. I found that I had a bit more luck having multiple subjects when I positioned them using the three levels. It was not perfect but it did seem better than a regular prompt in my testing. Although trying to put a bunny in the foreground, a cat in the middleground, and a bear in the background got me three very weird hybrid critters! So, not perfect by any means.





Directionality: Placing objects above, below, left, right, behind and in front.
In Midjourney, achieving the desired placement and orientation can be challenging. While you may have some luck with commonly understood arrangements like ‘picture above a sofa,’ more specific setups, like placing a ‘cat under a sofa,’ are less reliable. To gain better control over the depth of your composition, it’s advisable to use spatial terms like ‘foreground,’ ‘middleground,’ and ‘background,’ as they tend to be more reliable than ‘in front of’ or ‘behind.’ When it comes to directional details, such as placing a knight’s sword in his left hand, the results can be unpredictable. If exact orientation is crucial, consider using the Vary(region) feature for manual adjustments or be prepared to generate multiple iterations.
Negative Space: Empty or open space around an object.
Positive Space: Area occupied by the main subject.
Use them together or separately, together seems to be more powerful.





Focal Point: Point in the composition drawing the viewer’s eye.
Midjourney has a tendency to put the subject at the center of every image, so this may not be as useful as the others.



Traditional Artistic Perspectives
Linear Perspective: Illusion of depth on a flat surface.


Vanishing Point: Where parallel lines appear to converge.



Horizon Line: Level line where land or water and sky meet.

Aerial Perspective: Using color and clarity to show distance.


Orthogonal Lines: Lines that appear to recede toward a vanishing point.


Foreshortening: Object or distance appears shorter than it is.



Camera Angles and Views
Bird’s-Eye View: High above view, showing a large area. High-angle-shot also works nicely.



Worm’s-Eye View or low-angle-shot: Ground-level view looking up.


First-person Perspective: as seen from the viewer point of view.

Fish-Eye Lens: Strong visual distortion for wide panoramic or hemispherical image.


Panoramic: Unbroken view of the entire surrounding area.
I should note here that panoramic is best done in a landscape aspect ratio like —ar 16:9 or — ar 3:2 to get a better effect. If you have an image where you did not use an aspect ratio, you can try the new Pan features under the upscaled image to pan the image in one of the four directions. This will allow you increase width or height. See my previous articles about his feature. Zoom out can also be useful.


Parallax: The effect whereby the position or direction of an object appears to differ when viewed from different positions.



Advanced Concepts
Isometric Perspective: Equal foreshortening along each axis.



Oblique Projection: Simple type of graphical projection for two-dimensional images.



Chiaroscuro: Strong contrasts between light and dark for volume illusion.
- I have touched on this great all-purpose word in a few of the articles now. It is excellent for adding shadow and depth to just about anything.




Z-Axis: The axis denoting depth in a 3D coordinate system.
- I got the impression that Midjourney understands that the Z-axis means vertical, but I cannot say that it does more than that without more testing.

X-Y Plane: This term is intended to specify a plane with width and height but no depth.
- However, I’ve found that it often misfires, generating images of airplanes about 80% of the time — even when adding “no airplane” to the prompt. I had some success with “x-axis” and “y-axis,” but not consistently enough to recommend these terms for image adjustment. As always, your experience may vary. Using “2D” as a substitute for “X-Y Plane” seems to yield more reliable outcomes. To refine your prompt further, pair “2D” with descriptors like “2D-drawing,” “2D-illustration,” or “2D-schematic.” If you prefer a flat, organized layout, consider adding the term “knolling” to your prompt.








Rule of Thirds
- In my testing I was completely unimpressed by anything I got with the use of the rule of thirds. As often as not I ended up with oddball lines on my images and it was useless for getting the subject out of the center of the image, sadly.

Proportion: Size relationship of parts to a whole.
This is an important item for artistic expression but I was unable to find a way to demonstrate it since Midjourney has such issues with scale. It does not work to say one thing twice the size of something else. Such equivalencies must be spelled out distinctly.
Overlapping: Placing one object in front of another for depth.

Implied Lines: Lines created by positioning objects in sequence.


Juxtaposition: Contrasting elements placed close together.
When requesting a “juxtaposition” from Midjourney, the platform often combines elements in ways that might not meet your expectations. Achieving a clear distinction between two unrelated subjects in your image generally requires careful and precise prompting. Unfortunately, there’s no quick fix for this — you’ll need to invest time in crafting your prompt to get the desired result.
Takeaways:
Perspective is King: Perspective-based terms offer a more reliable and versatile toolkit for guiding AI art algorithms. These terms allow for a systematic and predictable manipulation of spatial elements, giving artists greater control over the final output. Camera terms can work as well but may give more mixed results.
Specificity Matters: Being precise and descriptive in your prompts can be more rewarding. Given that these algorithms aren’t perfect, the more clearly you can describe what you want, the better the chances of receiving a satisfactory result. This holds true for complex effects like parallax, depth of field, or chiaroscuro.
Trial and Error: Like any other art form, it seems that mastery of generative AI art comes with experimentation. While certain terms like “Rule of Thirds” or “Scale” may not produce ideal results now, that doesn’t mean they won’t in future versions of Midjourney or with different phrasing.
AI’s Limitations: It’s important to note that despite their sophistication, these algorithms still have limitations when it comes to fully understanding human concepts, especially those that require a level of artistic intuition or a deeply ingrained understanding of visual language.
In Conclusion
Both photographic and perspective terminology have their places in the world of generative AI art. However, perspective terms tend to offer a more consistent and adaptable framework, especially valuable for those involved in iterative creative processes. Understanding the limitations and strengths of each approach can dramatically improve the quality of your generated art, making your artistic journey with AI more focused and rewarding.
If you found this article insightful, please don’t hesitate to give it a clap — or multiple claps if you really enjoyed it! Your feedback helps me to refine my content. Feel free to leave your thoughts in the comments section, and don’t forget to follow me for more in-depth articles like this one. Thank you for reading!

A Message from AI Mind

Thanks for being a part of our community! Before you go:
- 👏 Clap for the story and follow the author 👉
- 📰 View more content in the AI Mind Publication
- 🧠 Improve your AI prompts effortlessly and FREE
- 🧰 Discover Intuitive AI Tools
