OpenAI has launched significant enhancements to its AI platforms, including GPT-4 Vision and DALL-E3 in the API, expanded the GPT-4 context window to 128,000 tokens, and introduced customizable GPTs, signaling a shift towards a comprehensive ecosystem for AI technology.
Abstract
OpenAI's recent DevDay Conference announcements mark a pivotal moment in the evolution of generative AI. The integration of GPT-4 Vision into the API allows for the development of applications that can interpret and describe images, democratizing access to visual AI capabilities. The addition of DALL-E3 to the API enables the creation of high-quality, realistic visuals directly within applications, opening up new possibilities for content creation. The expansion of GPT-4's context window to an unprecedented 128,000 tokens means that the AI can now process and generate content akin to a 300-page book, vastly improving its ability to handle extensive and complex tasks. Furthermore, the introduction of customizable GPTs empowers users to create AI models tailored to specific needs without coding expertise, potentially revolutionizing industries by automating specialized tasks. These advancements collectively indicate OpenAI's commitment to transitioning from a closed system to a robust ecosystem, embedding AI as a fundamental layer across various technologies to enhance user experience.
Opinions
The author believes that the new features, particularly the API enhancements, are as significant as the original ChatGPT release, highlighting their potential to change the world.
The author is impressed with the capabilities of GPT-4 Vision, noting its remarkable performance in real-world applications such as assisting visually impaired individuals and improving media content management.
The author emphasizes the transformative impact of DALL-E3's integration into the API, suggesting it will lead to widespread use in various content creation scenarios.
The author underscores the importance of the expanded context window, noting its implications for processing long-form content and its alignment with competitor offerings.
The author is enthusiastic about the potential of custom GPTs, foreseeing a marketplace for AI models that could be as profitable as Apple's App Store.
The author cautions about the potential risks of data leakage with custom GPTs, advising careful consideration when uploading proprietary or confidential information.
The author draws a parallel between the current state of generative AI and the early days of the Internet and smartphones, predicting a similar flourishing of the AI ecosystem with these new developments.
OpenAI Quietly Launched New Features. They’re As Big a Deal as the Original ChatGPT.
The changes signal the start of a real ecosystem for OpenAI’s tech
Illustration by the author via Midjourney
Last week, as part of its new DevDay Conference, OpenAI announced some major additions to its industry-leading generative AI platforms.
At first glance, they may not seem like much. Reading OpenAI’s announcements, it’s easy to get bogged down in the nerdy specificity of some of these new additions.
But make no mistake; as technical as they may seem, OpenAI’s new announcements have the potential to change the world just as much as ChatGPT did around this time last year.
Here’s the core of what OpenAI announced, and why it matters.
Your App Has Eyes
One of OpenAI’s most exciting announcements was the availability of the GPT-4 Vision platform through their API.
Previously, OpenAI had released the Vision platform only via their ChatGPT interface. Vision is super powerful, but was somewhat limited — in order to use it, you needed to log into ChatGPT and upload your images manually, a few at a time.
Adding Vision to OpenAI’s API is a game changer. It means that instead of having to use a manual process, developers can now build vision capabilities directly into their own applications.
A screen-reader company, for example, could build the system into their software, allowing visually impaired readers to have a description of any image on the internet instantly created and read aloud as they surf.
I tested this new capability myself and it’s pretty remarkable.
Media companies like my company, Gado Images, can use the Vision platform to understand the visual contents of our databases of tens of thousands, or even millions, of images. That capability makes it way easier to locate valuable parts of a collection.
These are just a few early use cases. Now that developers can build using GPT-4 Vision, they’re certain to bake it into all kinds of apps and software platforms.
In most cases, customers won’t even know that the GPT-4 platform is involved — all they’ll experience are major upgrades in how they work with visuals.
In short, the release of GPT-4 Vision in an API means computers can now see and understand the world.
DALL-E3 in the API
Another major addition to the API is OpenAI’s DALL-E3. OpenAI had released DALL-E2 through the API, but the images it created weren’t very good, especially in comparison to the outputs from modern systems like Midjourney.
The addition of DALL-E3 to the API means that just as computers can see the world, they can now create visuals that are compelling and realistic.
Companies could use this to build automatic illustrations into a blogging platform, for example, or to automatically create pictures for a children’s story.
The sky is the limit; DALL-E3 can create realistic visuals in nearly any style. I’ve been using it to create illustrations for blog posts, several of which have gone viral. I think the strength of the graphics is a big factor.
Infographic illustration created via DALL-E3 by the author
Ingest a Book
Another exciting announcement was the expansion of GPT-4’s context window to 128,000 tokens.
If you’re scratching your head and wondering what the heck that means; you’re probably not alone. Most people don’t know much about tokens, context windows, or how they relate to large language models.
The context window of an AI system refers to the total amount of text or other inputs and outputs it can process at a given time.
When you interact with GPT-4 or ChatGPT, the system continuously looks at all of the things you said to it during that conversation, and all of its responses. That gives it the important context it needs to refine its responses and to appear to “chat” with you in a coherent way.
If you’re sending long blog posts or other long-form messages, or having a very long conversation with the model, you could easily use up the context window, which was previously as low as 8,000 tokens (tokens are parts of words — an 8k token window works out to around 6,000 words.)
As you can imagine, the size of the context window makes a big difference in terms of what the model can process. A short context window prevents the model from processing long inputs, like a book or long-form article, a long transcript, or the like. It also prevents the model from yielding long results.
That’s why the introduction of the 128,000 token context window is such a big deal. 128,000 tokens is enough to ingest approximately a 300-page book. That means that GPT-3 can now take book-length inputs, and potentially yield book-length outputs.
Even if you’re not using the system to analyze a novel, the longer context window matters. In a business context, for example, you could feed GPT-$ your entire 50-page Standard Operating Procedure for some important process, ask it extensive questions, and get a 10,000-word document back.
As the name “context window” suggests, a longer context window allows the model to have much more context, and work with a lot more data in its back-and-forth with users. That lets it have longer conversations, and build in more knowledge in order to create its responses.
It also brings the model more in line with its major competitor Claude, which has a 100,000 token window.
Again, this one feels a bit pedantic, but here’s the takeaway from the context window expansion: GPT-4 can now read and write things as long as a book, not simply craft short-form posts and conversations as it did before.
Bespoke GPTs
This is, perhaps, the biggest change with OpenAI’s launch. Anyone can now create what is essentially a fine-tuned version of GPT-$, using a simple web interface.
These bespoke custom GPTs allow users to make their own AI system that others can interact with, without having to do any coding or hosting a program themselves.
Creating one is simple. You give the model some instructions and sample inputs and outputs and upload any documents that you want it to be able to access.
Suppose you’re a copy editor. You could upload your publication’s entire style guide. You could then instruct the custom GPT to look at articles handed to it and critique how well they fit the style guide, making suggestions for changes.
Next, you could share your custom GPT with your publication’s writers. They could run their articles through the custom GPT themselves, receiving detailed feedback that specifically relates to your own publication’s style guide.
You’d essentially have an automated version of you — or at least one that’s good enough to do a first pass, saving you valuable human time.
As you can imagine, the capabilities of these custom GPTs are limited only by people’s imaginations, and their willingness to share data with the system.
Even with that limitation, the ability to create custom GPTs is a huge deal. OpenAI has even promised to create a store where people can buy and sell these custom systems.
This aspect of the launch has been compared to the launch of Apple’s famously profitable App Store. Just as the Apple App Store lets any developer sell an app to iPhone users, OpenAI’s store will let people create custom GPTs for nearly any function, and then easily charge users for access.
It remains to be seen exactly how this will play out, but it has the potential to create some truly useful tools — and also to make those with the training skills (or tons of useful data with which to feed a custom model) a lot of money.
Why The Changes Matter
Again, most of these changes feel highly technical. But in reality, they have the potential to be just as impactful as the launch of ChatGPT itself.
Tech ecosystems generally start as “walled gardens.” Picture AOL in the early 1990s. Beyond some academic institutions, the Internet at the time consisted mostly of carefully controlled chatrooms and functions like email that were largely managed by a single company.
As ecosystems mature, though, they expand. As AOL and other ISPs facilitated access to the open web, it flourished. We ended up with resources like Wikipedia, as well as conveniences like online shopping, video conferencing, and even life-saving services like remote medical care.
Similarly, early smartphones provided access to a limited number of resources, mostly controlled by their creators. It wasn’t until the launch of the previously-mentioned App Store and the release of Android OS — and the explosion of third-party apps and content — that smartphones truly revealed their potential.
We’re at a similar moment with generative AI. Before, most people interacted with AI systems through highly controlled interfaces provided by individual companies. If you wanted to use a content-generating chatbot, you went to ChatGPT. If you needed a photo, you loaded up Discord and messaged the Midjourney bot.
OpenAI’s changes last week take the ecosystem for AI in a totally new direction. Yes, OpenAI has had an API (which allows outside developers to interface with its systems) for years.
But the decision to release powerful visual capabilities and a massive context window to developers — in addition to building in no-code tools for customizing GPTs — signals that OpenAI intends to move its technologies beyond its tightly controlled ChatGPT interface and build a true ecosystem around the tech.
That’s huge news for developers, who find themselves with a massive new toolkit for building incredible software. But it’s also huge news for users. As companies use OpenAI’s API to build more generative AI tools into their existing software, we’re going to see apps get much smarter, more visual, and more efficient.
Generative AI will become less of a tool that you log into a website and use, and more of a fundamental layer, making every piece of tech you interact with subtly better.
I’ve tested thousands of ChatGPT prompts over the last year. As a full-time creator, there are a handful I come back to every day. I compiled them into a free guide, 7 Enormously Useful ChatGPT Prompts For Creators. Grab a copy today!