Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

ue called “<b>video compression networks</b>” to compress the input images or videos into a lower-dimensional representation. This process is akin to “standardizing” photos of different sizes and resolutions to facilitate processing and storage. It doesn’t mean ignoring the uniqueness of the original data but rather transforming them into a format that is easier for Sora to understand and manipulate.</p><p id="e278">Next, Sora further decomposes these compressed data into what are known as “<b>spacetime patches</b>”. These patches can be seen as the basic building blocks of visual content, much like how each photo in our previous album can be broken down into small fragments containing unique landscapes, colors, and textures. This way, regardless of the original video’s length, resolution, or style, Sora can process them into a consistent format.</p><p id="6d6f">Through this approach, Sora can unify visual data from different sources and styles into an operational internal representation while retaining the richness of the original visual information. It’s similar to how, when browsing through a photo album of world landmarks, despite the diverse photos, you can still understand and appreciate them in the same way.</p><h1 id="eaca">What Sets Sora’s Apart From Other Products?</h1><figure id="5f90"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*mitSydpxgAEjU93y"><figcaption></figcaption></figure><p id="d266">From a technical standpoint, Sora adopts the “<b>Diffusion + Transformer</b>” approach in video generation, marking a significant milestone in technological advancement. In contrast to the U-Net, another diffusion-based model used in the past, the Transformer architecture offers strong scalability in parameters.</p><p id="9284">This means that as the number of parameters increases, performance accelerates, and it supports training data of arbitrary resolutions, aspect ratios, and durations without compromising data quality due to compression. Additionally, Sora trains autoencoders capable of compressing videos in both time and space, which significantly contributes to its ability to generate longer videos.</p><p id="7912">Sora’s technological approach is entirely different. Previous methods for video and image generation relied on Diffusion, which combines multiple real images. This time, OpenAI leverages its large language model advantage by integrating LLM (Large Language Model) with Diffusion for training.</p><p id="9047">This enables Sora to understand the real world and simulate it, granting it the dual capabilities of understanding and simulating the world. As a result, the generated videos are more realistic and capable of simulating the real physical world beyond the confines of 2D.</p><h1 id="5768">What is the Commercial Value of Sora?</h1><p id="b8bb">The emergence of the Sora model heralds an impending industry transformation. In the realm of content creation and media, it will revolutionize traditional video production methods, empowering creators to produce high-quality videos more efficiently and cost-effectively, thus driving overall industry efficiency. For industries such as advertising, movie trailers, and short videos, the Sora model has the potential to bring disruptive innovations. Additionally, its capabilities in simulating the world open up vast potential space for industries like virtual reality and game development.</p><p id="c88c">From a business perspective, considering the industry chain from “idea or IP → content production → content distribution”, it is anticipated that the cost of video production will significantly decrease in the long term. Currently, AI-generated videos still require improvement and may be more suitable for low-quality video content, such as internet advertising materials for gaming acquisitions. As Sora is largely closed-source and its technical details have not been disclosed, there remains a gap in domestic video generation models compared to the availability of more open-source models like GPT. This could benefit animation film companies, IP companies with slow implementation but innovative ideas, and the gaming industry seeking cost reduction and efficiency improvement.</p><p id="df77">With its robust foundation of large models based on human language understanding and knowledge of human knowledge and world models, Sora can be further enhanced with various other technologies to create super tools across various fields. This includes biomedical, protein

Options

and gene research, as well as disciplines like physics, chemistry, and mathematics. Large models will play a role in these areas. Sora’s simulation of the physical world will have a significant impact on embodied intelligence in robotics and autonomous driving.</p><h1 id="7eae">How to View Sora Concept Stocks?</h1><figure id="6c21"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*NaMd2pAk7g3aAs8vqmZXOA.png"><figcaption></figcaption></figure><p id="baac">Currently, large-scale models have been utilized to model human images and videos using generative technologies such as diffusion and GANs, enabling generation, control, and editing of image and video content based on text and audio inputs. Sora-related products are still in the early stages, and further research is needed for future applications. If Sora’s future applications are extensive, the volume of generated video data is expected to increase, which will have a certain promotion effect on the industry.</p><h1 id="25dd">What Investment Opportunities Will Sora Bring to the AI Track?</h1><figure id="3bfd"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*X-iJmmrapJDmHv0c"><figcaption></figcaption></figure><p id="d5be">The release of Sora is expected to ignite another wave of AI frenzy. Although Sora’s videos are currently limited to one minute, given OpenAI’s iteration speed, it won’t be long before they produce videos spanning several minutes. Over the next few years, it will revolutionize the entire film and short video industry. The current AI frenzy has entered its second phase, focusing on AI innovation at the product level and performance verification at the data level. The future market explosion will be driven by both application and infrastructure development, with a focus on TOB applications with strong AI integration on the application side, and GPU, optical module, liquid cooling, and data elements on the infrastructure side.</p><p id="0a4e">AI remains the mainstream of current technology investment. Overseas AI large models and applications maintain a fast iteration speed. With the release of models like Sora and Gemini 1.5 Pro, multimodal capabilities have significantly improved, further expanding application scenarios. This will drive investment in computing infrastructure. It is recommended to pay attention to sectors such as optical modules and ICT infrastructure.</p><p id="668a">For short video UGC platforms, the further development of creator economy driven by AI-upgraded editing tools needs to be considered, but attention should also be paid to potential changes in competition dynamics. For deep content, there will be an increase in IP value, leading to intensified competition for shallow and low-threshold content. Regarding creative tools, key factors in addressing competition are B-end customer stickiness and copyright library accumulation. The identification of AI-generated fake content will be a crucial aspect of content moderation, impacting news production and dissemination. Areas to watch include the optical communication industry chain, computing equipment industry chain, multimodal algorithm layout, deep content creation with rich IP reserves, and commercial AI application scenarios.</p><p id="19e8">The AI track is likely to remain one of the main themes in the near future, and attention can still be focused on the allocation opportunities of computing power and applications within the AI industry chain.</p><p id="e509">Catalyzed by breakthroughs in AI applications, the AI sector has shown strong momentum. The industrial logic of the current AI sector is relatively smooth, and the sustainability of future growth may improve. Performance may also exceed expectations, making its future performance relatively optimistic.</p><h1 id="08f0">Conclusion</h1><p id="df5f">In conclusion, Sora represents a significant advancement in AI technology, particularly in the domain of video generation. Its ability to create realistic video footage from text instructions opens up new possibilities across various industries, including film production, advertising, and virtual reality. The innovative techniques used in Sora, such as diffusion and Transformer architectures, demonstrate the potential for AI to revolutionize content creation and delivery. As Sora continues to evolve and integrate with other technologies, it is poised to drive further innovation and efficiency in the media and entertainment landscape.</p></article></body>

What Exactly is Sora, the Catalyst for Market Surge?

On February 19th, stocks related to the Sora concept triggered a wave of limit-up trading. This AI video model, known as a ‘nuclear-grade’ product released by OpenAI, can summarize a 60-second video into a single frame, producing an explosive effect! Let’s explore how institutions and industry insiders analyze Sora together.

Sora refers to an AI video model recently released by OpenAI, which has garnered significant attention due to its capability to condense a 60-second video into a single frame effectively. In this article, I will highlight the interest of both institutions and industry insiders in analyzing the impact of Sora on the market.

What is Sora?

Introduced on February 15, 2024, Sora represents OpenAI’s latest AI model. Sora has the capability to generate lifelike video content, spanning up to one minute in duration, based solely on textual instructions. According to reports, the generated footage closely follows the user’s directives regarding subject matter and stylistic preferences.

OpenAI asserts that Sora will eventually be able to produce even more intricate scenes, featuring multiple characters and specific types of movement, all while incorporating precise details of both the subjects and their surroundings. They claim to have trained the model not only to comprehend the user’s prompts but also to understand how those elements manifest in the real world.

The challenges in AI video generation lie in the fact that videos consist of continuous frames, requiring logical coherence rather than simple combinations of images. Additionally, this process increases model complexity, computational difficulty, and costs. Furthermore, generating lifelike videos like Sora requires a vast amount of “text-to-video” paired data, which is currently lacking in diversity and entails significant annotation efforts. Hence, previous videos were mostly 5–15 seconds long, while Sora extends the video length to 1 minute, supporting the creation of multiple scenes while maintaining consistency in characters and visual styles, and to some extent, “understanding” the real world.

Sora boasts three notable highlights:

It can maintain high fluidity and stability between the video subject and background in a 60-second long video
It achieves multiple camera angles within a single video with logical scene transitions that are exceptionally smooth
It demonstrates a remarkable ability to comprehend the real world, handling details such as light and shadow reflections, motion styles, and camera movements excellently, greatly enhancing realism.

How Does Sora Work?

Before delving into how Sora handles diverse visual data, let’s first imagine a scenario from everyday life:

You’re flipping through a photo album of world landmarks. This album contains pictures of landscapes from different countries and styles — some are vast seascapes, others are narrow alleyways, and some depict the vibrant cityscape at night. Despite the diverse content and styles of these photos, you can easily recognize the location and emotions represented in each picture because your brain can unify and understand these various visual cues.

Now, let’s compare this process with how Sora handles diverse visual data. The challenge facing Sora is akin to needing to process and understand millions of images and videos captured from various locations worldwide and different devices.

These visual data vary in resolution, aspect ratio, color depth, and other aspects. To enable Sora to understand and generate such rich visual content like the human brain, OpenAI has developed a set of methods to transform these diverse types of visual data into a unified representation.

Firstly, Sora utilizes a technique called “video compression networks” to compress the input images or videos into a lower-dimensional representation. This process is akin to “standardizing” photos of different sizes and resolutions to facilitate processing and storage. It doesn’t mean ignoring the uniqueness of the original data but rather transforming them into a format that is easier for Sora to understand and manipulate.

Next, Sora further decomposes these compressed data into what are known as “spacetime patches”. These patches can be seen as the basic building blocks of visual content, much like how each photo in our previous album can be broken down into small fragments containing unique landscapes, colors, and textures. This way, regardless of the original video’s length, resolution, or style, Sora can process them into a consistent format.

Through this approach, Sora can unify visual data from different sources and styles into an operational internal representation while retaining the richness of the original visual information. It’s similar to how, when browsing through a photo album of world landmarks, despite the diverse photos, you can still understand and appreciate them in the same way.

What Sets Sora’s Apart From Other Products?

From a technical standpoint, Sora adopts the “Diffusion + Transformer” approach in video generation, marking a significant milestone in technological advancement. In contrast to the U-Net, another diffusion-based model used in the past, the Transformer architecture offers strong scalability in parameters.

This means that as the number of parameters increases, performance accelerates, and it supports training data of arbitrary resolutions, aspect ratios, and durations without compromising data quality due to compression. Additionally, Sora trains autoencoders capable of compressing videos in both time and space, which significantly contributes to its ability to generate longer videos.

Sora’s technological approach is entirely different. Previous methods for video and image generation relied on Diffusion, which combines multiple real images. This time, OpenAI leverages its large language model advantage by integrating LLM (Large Language Model) with Diffusion for training.

This enables Sora to understand the real world and simulate it, granting it the dual capabilities of understanding and simulating the world. As a result, the generated videos are more realistic and capable of simulating the real physical world beyond the confines of 2D.

What is the Commercial Value of Sora?

The emergence of the Sora model heralds an impending industry transformation. In the realm of content creation and media, it will revolutionize traditional video production methods, empowering creators to produce high-quality videos more efficiently and cost-effectively, thus driving overall industry efficiency. For industries such as advertising, movie trailers, and short videos, the Sora model has the potential to bring disruptive innovations. Additionally, its capabilities in simulating the world open up vast potential space for industries like virtual reality and game development.

From a business perspective, considering the industry chain from “idea or IP → content production → content distribution”, it is anticipated that the cost of video production will significantly decrease in the long term. Currently, AI-generated videos still require improvement and may be more suitable for low-quality video content, such as internet advertising materials for gaming acquisitions. As Sora is largely closed-source and its technical details have not been disclosed, there remains a gap in domestic video generation models compared to the availability of more open-source models like GPT. This could benefit animation film companies, IP companies with slow implementation but innovative ideas, and the gaming industry seeking cost reduction and efficiency improvement.

With its robust foundation of large models based on human language understanding and knowledge of human knowledge and world models, Sora can be further enhanced with various other technologies to create super tools across various fields. This includes biomedical, protein and gene research, as well as disciplines like physics, chemistry, and mathematics. Large models will play a role in these areas. Sora’s simulation of the physical world will have a significant impact on embodied intelligence in robotics and autonomous driving.

How to View Sora Concept Stocks?

Currently, large-scale models have been utilized to model human images and videos using generative technologies such as diffusion and GANs, enabling generation, control, and editing of image and video content based on text and audio inputs. Sora-related products are still in the early stages, and further research is needed for future applications. If Sora’s future applications are extensive, the volume of generated video data is expected to increase, which will have a certain promotion effect on the industry.

What Investment Opportunities Will Sora Bring to the AI Track?

The release of Sora is expected to ignite another wave of AI frenzy. Although Sora’s videos are currently limited to one minute, given OpenAI’s iteration speed, it won’t be long before they produce videos spanning several minutes. Over the next few years, it will revolutionize the entire film and short video industry. The current AI frenzy has entered its second phase, focusing on AI innovation at the product level and performance verification at the data level. The future market explosion will be driven by both application and infrastructure development, with a focus on TOB applications with strong AI integration on the application side, and GPU, optical module, liquid cooling, and data elements on the infrastructure side.

AI remains the mainstream of current technology investment. Overseas AI large models and applications maintain a fast iteration speed. With the release of models like Sora and Gemini 1.5 Pro, multimodal capabilities have significantly improved, further expanding application scenarios. This will drive investment in computing infrastructure. It is recommended to pay attention to sectors such as optical modules and ICT infrastructure.

For short video UGC platforms, the further development of creator economy driven by AI-upgraded editing tools needs to be considered, but attention should also be paid to potential changes in competition dynamics. For deep content, there will be an increase in IP value, leading to intensified competition for shallow and low-threshold content. Regarding creative tools, key factors in addressing competition are B-end customer stickiness and copyright library accumulation. The identification of AI-generated fake content will be a crucial aspect of content moderation, impacting news production and dissemination. Areas to watch include the optical communication industry chain, computing equipment industry chain, multimodal algorithm layout, deep content creation with rich IP reserves, and commercial AI application scenarios.

The AI track is likely to remain one of the main themes in the near future, and attention can still be focused on the allocation opportunities of computing power and applications within the AI industry chain.

Catalyzed by breakthroughs in AI applications, the AI sector has shown strong momentum. The industrial logic of the current AI sector is relatively smooth, and the sustainability of future growth may improve. Performance may also exceed expectations, making its future performance relatively optimistic.

Conclusion

In conclusion, Sora represents a significant advancement in AI technology, particularly in the domain of video generation. Its ability to create realistic video footage from text instructions opens up new possibilities across various industries, including film production, advertising, and virtual reality. The innovative techniques used in Sora, such as diffusion and Transformer architectures, demonstrate the potential for AI to revolutionize content creation and delivery. As Sora continues to evolve and integrate with other technologies, it is poised to drive further innovation and efficiency in the media and entertainment landscape.