avatarSalvatore Raieli

Summarize

| AI| RECAP| ARTIFICIAL INTELLIGENCE|

The 2023 AI year in brief

A recap of an incredible AI year

Photo by Crawford Jolly on Unsplash

This article is a brief recap of the most interesting trends and events that have most defined this 2023.

Scaling the context length

Photo by Diana Polekhina on Unsplash

The parameter race seems to be over. GPT-4 has probably not reached the trillion mark as had been anticipated. Considering that it was no longer practical for anyone to build larger models, the new domain of the struggle became context length.

One of the drawbacks of Transformer-based LMs is the limited context length due to the quadratic computational costs in both time and memory. Meanwhile, there is an increasing demand for long context windows in applications such as PDF processing and story writing. (source)

As can be seen in the pre-2023 models, the maximum context length is 2048:

image source: here

GPT-4 has a stated context length of 32 K and Claude seems even 100 K tokens. At the same time, LLaMa 2 had an initial context length of 4 K but was quickly increased by the open-source community.

Too bad that at the moment the models still cannot use this huge context length:

It is interesting to see that a scaling-down trend is also observed. The models have finally gone into production, and the costs associated with infrastructure and resources are astronomical. Therefore, all companies are trying to develop smaller models that can do the same job as larger ones at least for particular tasks.

In any case, this year and probably next year as well, there will be increasing focus both on productivity and on whether these models can generate economic return.

On the same idea, another interesting trend is the design of AI agents, where the model can access tool (“to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction.”)

Ensembles are back in the game

Photo by Irina Iriser on Unsplash

GPT-4 was trained with both text with pictures and showed incredible capabilities from the beginning. For example, the researchers were surprised that it passed the Bar Exam.

OpenAI released GPT-4, which it claims beats 90% of humans who take the bar to become a lawyer, and 99% of students who compete in the Biology Olympiad (source)

image source: here

The monstrous success of GPT-4 is further evidence of how much Reinforcement Learning from Human Feedback (RLHF) has changed our approach to large language models (LLMs). The publication of chatGPT paved the way for many other similar models and also made the general public understand the capabilities of LLMs.

Today, RHLFs are used by almost all state-of-the-art LLMs and are central to chat applications. Then again, it requires human annotators, especially when performance is a focus.

GPT-4 was a turning point not only for its incredible performance but also for several other reasons:

  • Lack of transparency, for the first time no information about the training, dataset, and architecture was declared.
  • GPT-4 became the standard of comparison for a whole range of tasks.
  • The model was also used as a judge to evaluate other models.
  • GPT-4 was used to generate prompts or as a teacher model.

But is it only RHLF that is the secret of GPT-4?

It seems that GPT-4 is not a single model but rather a combination of 8 smaller models, each consisting of 220 billion parameters. It also seems that it can be considered an ensemble, being a mixture of experts.

This opens up interesting twists and turns, a prediction for 2024?

We will see more mixture of experts, perhaps with an ensemble of smaller expert models.

Multimodal world

We have seen how quickly there has been an attempt to extend multimodality models. Not only at the experimental level but also at the production level. The most obvious example is GPT-4 with vision (GPT-4V).

GPT-4V enables users to instruct GPT-4 to analyze image inputs provided by the user, and is the latest capability we are making broadly available. Incorporating additional modalities (such as image inputs) into large language models (LLMs) is viewed by some as a key frontier in artificial intelligence research and development. (source)

image source: here

In any case, it is not only OpenAI that is working on vision-language models. Other notable examples include Google Flamingo (closed-source), Google PaLM E, and BLIP-2 (open-source).

An interesting development, LLaMA-Adapter V2 is designed to be able to transform LLaMA into a multimodal model

image source: here

Open-source

image source: here

LLaMA is the most influential model after GPT-4. Why not Bard?

LLaMA (and LLaMA-2) are two open-source models. This has resulted in their rapid adoption by the community. LLaMa for example immediately led to a number of offspring.

The open-source community has proven to be a formidable force, leading in recent months to be the third competitor between OpenAI and Bard. As soon as LLaMA was released, several groups experimented and expanded the context length, gave it new capabilities, and trained it for several tasks.

Recent studies show that smaller models can even rival larger models if they are trained in the right way. This will clearly favor the open-source community, which despite having the same resources as big companies still has many ideas and less bureaucracy.

We are finally more critics of LLMs

After the initial euphoria, the debate about the limitations of LLMs began to grow louder and louder. For example, perhaps the much-vaunted emergent properties do not exist. This is one of several reasons why the race to the parameter was divested.

Or after much screeching that Vision Transformers would retire convolutional networks, there was a realization that convnets in comparison on par can compete with ViTs. In general, we could say that perhaps the transformer is beginning to show its limitations after so many years of being the undisputed champion.

The medical models

AlphaFold-2 was revolutionary for research on protein structures. Research did not stop this year, however. Not only have new models been developed that can predict protein structures with similar accuracy to AlphaFold-2 and greater speed. In addition, diffusion models have been used to design proteins from scratch.

image source: here

Other models have been able to predict how gene expression changes as a result of stimulating or repressing combinations of genes or to predict whether a mutation is pathogenic (AlphaMissense)

Other models such as Google MedPalm and PMC LLaMA have shown vastly superior capabilities to previous models in answering medical-themed questions. In some cases, Google Med-PaLM answers were preferred by clinicians. Subsequently, Google extended its model by transforming it into a multimodal model.

In any case, although there are several models devoted to medicine or biology, there is research toward other disciplines as well.

image source: here

Safety debate

Generative models have become increasingly capable. Whether it is text or images, the results are increasingly similar to the originals. So there has been different interest in finding a way on how to be able to identify text or images identified by AI:

As these systems become more pervasive, there is increasing risk that they may be used for malicious purposes. These include social engineering and election manipulation campaigns that exploit automated bots on social media platforms, creation of fake news and web content, and use of AI systems for cheating on academic writing and coding assignments. (source)

For this, watermark templates have been proposed for both images and text generated by LLMs:

image source: here

Same, Google DeepMind launched SynthID to watermark the image generated by their models.

Another interesting initiative is the Foundation Model Transparency Index. Stanford researchers have defined an index to measure the transparency of various models. The results show that there is still much work to be done even for open-source models:

This is a pretty clear indication of how these companies compare to their competitors, and we hope will motivate them to improve their transparency,” source

In any case, clear lines are still missing. The AI act is a first step but has left everyone unhappy. Still awaiting the arrival of the EU AI act (which is likely to be as disruptive for better or worse as the GDPR)

image source: here

Other patterns

  • There is increased interest in synthetic data but still, the results are not encouraging
  • Models capable of writing code have become better and better, and there are also open-source alternatives
  • Prompting techniques have become increasingly refined and complex. There is also active research regarding techniques such as Auto-CoT or prompts that are no longer drawn manually.
  • Text-to-video models become more accurate and have higher resolution
  • Music generation is the new frontier.
  • More accurate weather predictions, with longer range, and less computation
  • New diffusion models to design proteins from scratch.
  • The explosion of demand for GPUs continues, and NVIDIA is king in this market. NVIDIA is projected toward 1T capitalization. Computer clusters are growing around the world:
image source: here
  • ChatGPT is destroying business (Chegg the first victim). The recent OpenAI announcements killed companies that analyze PDF. Moreover, ChatGPT and Copilot are taking out traffic from StackOverflow:
image source: here
  • Copyright lawsuits are still ongoing and there is not yet a clear guideline.

Conclusions

image source: here

AI has arrived and exploded this year, especially in some applications. Since the arrival of ChatGPT, the general public has begun to understand the meaning of terms like LLMs.

From the perspective of a data scientist, it is exciting what lies ahead in the coming year.

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

or you may be interested in one of my recent articles:

Artificial Intelligence
Machine Learning
Data Science
Technology
Deep Learning
Recommended from ReadMedium