avatarSimone Tedeschi

Summary

The web content provides a comprehensive survey of generative AI, detailing its evolution, current challenges, new trends, real-world applications, ethical considerations, and future prospects, with a focus on OpenAI's Q* project.

Abstract

The article "Generative AI: A Fresh Survey" delves into the historical progression of generative artificial intelligence, from early statistical methods to advanced neural network architectures like Large Language Models (LLMs). It discusses the present challenges faced by LLMs, including fine-tuning, hallucinations, and alignment with human values. The text highlights emerging trends such as Mixture of Experts (MoE) and multimodal models, which promise increased efficiency and the ability to handle diverse data types. Real-world applications of generative AI across various sectors like healthcare, finance, education, and creative industries are examined, along with the ethical implications of these technologies. The future of AI is contemplated through the lens of OpenAI's Q* project, which aims to create an ethical, general-purpose AI system with a broad spectrum of capabilities.

Opinions

  • The author views the rapid development of generative AI positively, noting its significant milestones and transformative journey.
  • There is an acknowledgment of the ongoing challenges in AI, particularly in fine-tuning models for specific tasks and reducing hallucinations.
  • The article suggests that the integration of MoE and multimodal approaches is crucial for the advancement of AI.
  • The author emphasizes the importance of aligning AI with human ethics and values, a complex task that requires continued interdisciplinary research.
  • There is optimism about the potential of AI in healthcare, finance, education, and creative fields, while also recognizing the concerns and challenges these applications bring.
  • The Q* project by OpenAI is seen as a significant initiative that could lead to breakthroughs in AI, potentially reshaping the research landscape.
  • The text implies that AI development should be balanced with considerations for data privacy, misuse of information, and equitable access to technology.

Generative AI: A Fresh Survey

The past, the present and the future of Language Models

OpenAI 3D Logo. Image by thefactsite.com

In the dynamic landscape of Generative Artificial Intelligence (GenAI), maintaining pace with the latest developments can be a daunting task. But don’t worry, I’ve recently found a great paper on ArXiV that explores recent trends and future directions, and I’ll break it down in this story.

Outline: 1. The History of Generative AI: Let's Start from the Beginning 2. Current Challenges: Fine-Tuning, Hallucinations and Alignment 3. New Trends: Mixture of Experts (MoE) and Multimodal Models 4. Real-world Applications of Generative AI and Ethical Considerations 5. The Future of AI: The OpenAI's Q* Project

1. The History of Generative AI: Let’s Start from the Beginning

The rise of Generative AI has been marked by significant milestones, with each new model paving the way for the next evolutionary leap. Models, indeed, have undergone a transformative journey, evolving from rudimentary statistical methods to the complex neural network architectures that underpin today’s Large Language Models (LLMs).

Figure 1: Timeline of Large Language Models — Design by Armin Norouzi

The inception of language modeling (Fig. 1) can be traced back to the statistical approaches of the late 1980s, a period marked by a transition from rule-based to machine learning algorithms in Natural Language Processing (NLP). Early models, primarily n-gram based, calculated the probability of word sequences in a corpus, thus providing a rudimentary understanding of language structure. These models, though simplistic, laid the groundwork for future advances in language understanding.

The rise in computational power in the late 1980s sparked a revolution in NLP, shifting the focus towards statistical models capable of making ‘soft’ probabilistic decisions, as opposed to the rigid, ‘handwritten’ rule-based systems that dominated early NLP systems.

In the following decade, the popularity and applicability of these statistical models skyrocketed, proving invaluable in managing the flourishing flow of digital text. The 1990s saw the firm establishment of statistical methods in NLP research, with n-grams playing a crucial role in numerically capturing linguistic patterns.

A significant milestone was reached in 1997 with the introduction of Long Short-Term Memory (LSTM) networks and their application to voice and text processing, leading to the current era where neural network models represent the cutting edge of NLP research and development.

The emergence of deep learning has revolutionized the field, leading to the creation of language models such as GPT, BERT, RoBERTa, BART or DeBERTa and later, notably, LLMs such as OpenAI’s ChatGPT (November 2022). Recent models like GPT-4, LLaMA, Google Bard and Anthropic Claude have further pushed the boundaries of AI by showcasing unprecedented levels in language understanding and generation.

2. Current Challenges: Fine-Tuning, Hallucinations and Alignment

The rapid proliferation of LLMs, and their extensive utilization in the last few months, has emphasized the significance of fine-tuning, hallucination reduction, and alignment. These aspects play a crucial role in enhancing the functionality and reliability of LLMs.

Fine-tuning, i.e. the process of adapting pre-trained models to specific tasks, has made notable strides. Techniques such as prompt-based and few-shot learning, coupled with supervised fine-tuning on specialized datasets, have enhanced the adaptability of LLMs across various contexts. Despite this progress, challenges persist, particularly in addressing biases and ensuring the generalization of models across diverse tasks.

Persistent in LLMs is also the challenge of reducing hallucinations, referred to as the generation of confidently asserted yet factually incorrect information (Fig. 2). However, this issue has been partially mitigated by the introduction of Retrieval-Augmented Generation (RAG) models, i.e. models capable of retrieving relevant information before the actual text generation step.

Figure 2: An example of model hallucination. Picture by Karen Weise and Cade Metz (The New York Times)

If you want to know more about AI hallucinations, you can check out the following article for a more detailed overview.

Finally, concerning alignment, innovative approaches have been proposed to ensure that LLM outputs align with human values and ethics. Solutions range from constrained optimization to reward modeling techniques, all aiming to embed human preferences within AI systems, either during training or fine-tuning.

However, the complexity of aligning AI with the diverse spectrum of human ethics and the persistence of hallucinations, particularly on culturally-sensitive topics, highlight the need for continued interdisciplinary research in the development and application of LLMs.

3. New Trends: Mixture of Experts (MoE) and Multimodal Models

Mixture of Experts. The recently-adopted Mixture of Experts (MoE) setup is a big deal in the AI/LLM world (Fig. 3). This cool method, shown off by top-notch models like Google’s Switch Transformer and MistralAI’s Mixtral-8x7B, uses a bunch of transformer-based expert modules for dynamic token routing, making modeling more efficient and scalable.

Figure 3: The general MoE architecture. Image by Jongwon Yoon

One of the major benefits of MoE is how it can handle huge parameter scales, which cuts down on memory use and computational costs. This is done through model parallelism across specialized experts, which enables the training of models with trillions of parameters. Its specialization in dealing with diverse data distributions boosts its proficiency in tasks like few-shot learning.

Now, let’s consider its potential in healthcare. An MoE-based system could be used for personalized medicine, where different ‘expert’ modules specialize in various aspects of patient data analysis, including genomics, medical imaging, and electronic health records. This could significantly improve diagnostic accuracy and treatment personalization. Similarly, an MoE-based system could be used to create personalized gaming experiences, with distinct ‘experts’ focusing on player performance, play style, and in-game choices, respectively. Finally, in the field of marketing, MoE models could be used for consumer behavior analysis, with experts looking at different consumer indicators, market trends, and regulatory compliance factors.

However, to fully unlock the potential of MoE issues such as expert imbalance, dynamic routing complexity and probability dilution have to be addressed.

Multimodal Models. Along the same lines, the rise of multimodal AI is changing the way in how machines understand and interact with all sorts of human sensory inputs and contextual data (Fig. 4). These models facilitate accurate and data-efficient analysis by employing multi-view pipelines and cross-attention blocks. This integration of diverse inputs allows for a more nuanced and detailed interpretation of data, enhancing the model’s ability to accurately analyze and understand various types of information. Among these kinds of models, Google Gemini stands out as the latest multimodal conversational system, and it’s able to process text, documents, images, and code, but also audio and video.

Figure 4: Graphical comparison of unimodal and multimodal models. Image by Shehmir Javaid

However, the development of multimodal AI systems faces several technical hurdles, including creating robust and diverse datasets, managing scalability, and enhancing user trust and system interpretability. Challenges like data skew and bias are prevalent due to data acquisition and annotation issues, which requires effective dataset management by employing strategies such as data augmentation, active learning, and transfer learning. Another significant challenge is the computational demands of processing various data streams simultaneously, requiring powerful hardware and optimized model architectures for multiple encoders.

4. Real-world Applications of Generative AI and Ethical Considerations

The use of generative AI models in real-world situations is showing us both the amazing possibilities and the tough challenges in different sectors.

  1. Healthcare: In this sector, GenAI is making big strides in areas like diagnostic imaging and personalized medicine. For instance, it’s helping doctors spot diseases earlier and tailor treatments to individual patients. But it’s not all good news. There are serious worries about data privacy and the potential misuse of sensitive health information. We need to make sure that as we push forward with AI in healthcare, we’re also protecting patients’ personal information.
  2. Finance: AI is proving to be a powerful tool in finance as well, especially when it comes to spotting fraud and making algorithmic trades. It’s fast, it’s accurate, and it’s efficient. But there are ethical issues we need to take into account. Automated decision-making processes can lack transparency and accountability, which raises questions about fairness and oversight.
  3. Education: LLMs are opening up new possibilities in education, like creating personalized learning experiences. This could make education more accessible and instruction more tailored to individual students. But, again, there are hurdles to overcome. Not everyone has equal access to technology, and there’s the risk of biases in the AI-generated content. Additionally, if AI takes over some teaching tasks, what does that mean for human teachers?
  4. Creative AI: This is a rising field that is pushing AI’s creative limits across different forms like images, audio, and video. It’s all about generating artistic content, from telling stories to writing poetry/news/posts, but also composing music or creating visual arts. It’s even led to commercial hits like MidJourney and DALL-E. But it’s not without its challenges. We need to figure out the best ways to represent data, the right algorithms to use, and how to measure creativity effectively. Specifically, with the rise of Creative AI, copyright issues have become a significant concern. As AI starts to create content that could be very similar to human-created content, it raises questions about who owns the rights to that content. It’s a complex issue that’s still being worked out, and it’s something that anyone working with Creative AI needs to be aware of.
An image generated with Midjourney v6. Image by mid-journey.ai

5. The Future of AI: The OpenAI’s Q* Project

First of all: “What is Q*?” The Q* project is another huge OpenAI’s initiative aimed at advancing AI technology. While OpenAI hasn’t published specific details about Q*, it’s known that the project is focused on developing an ethical, general-purpose AI system that is beneficial for the society. Furthermore, the goal of Q* is to demonstrate proficiency across a broad spectrum of challenges, including mathematical reasoning, particularly challenging for nowadays LLMs.

But “How do they plan to achieve this?” Rumors say that the Q* project is all about mixing Reinforcement Learning (RL) and AI search algorithms with the creativity of LLMs. While Gemini has made big strides in multimodal AI, combining different types of data inputs like text, images, audio, and video, Q* is expected to take us far beyond what we’ve achieved so far by bringing together creative reasoning and structured problem-solving. This can be achieved by combining the precision and efficiency of algorithms like A* with the adaptable Q-learning strategy, and the complex understanding of human language and context that LLMs offer.

This kind of integration could allow AI systems to not just process and analyze complex multimodal data, but also to navigate through structured tasks while coming up with creative solutions and generating knowledge. This mirrors the many-sided nature of human thinking, and the potential implications of this advancement would be huge.

References

  1. [2312.10868] From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape (arxiv.org)
  2. The Evolution of AI: From Rule-Based Systems to Machine Learning | by AIspire | Medium
  3. LSTMs Explained: A Complete, Technically Accurate, Conceptual Guide with Keras | by Ryan T. J. J. | Analytics Vidhya | Medium
  4. Deep learning series 1: Intro to deep learning | by Dhanoop Karunakaran | Intro to Artificial Intelligence | Medium
  5. GPT models explained. Open AI’s GPT-1,GPT-2,GPT-3 | Walmart Global Tech Blog (medium.com)
  6. BERT Explained: A Complete Guide with Theory and Tutorial | by Samia Khalid | Medium
  7. Evolving with BERT: Introduction to RoBERTa | by Aastha Singh | Analytics Vidhya | Medium
  8. Revealing BART : A denoising objective for pretraining | by RISHABH TRIPATHI | Analytics Vidhya | Medium
  9. Papers Explained 08: DeBERTa. DeBERTa (Decoding-enhanced BERT with… | by Ritvik Rastogi | DAIR.AI | Medium
  10. A Beginner’s Guide to ChatGPT: Understanding What it Is, Why it Matters, and When/Where to Use It | by Colin Baird | Medium
  11. What’s new in GPT-4: Architecture and Capabilities | Medium
  12. LLaMA: Everything you want to know about Meta’s new AI model | by E2Analyst | Predict | Medium
  13. I got to see Bard in action and it’s amazing! | by E2Analyst | Predict | Medium
  14. Analysis of Claude: An AI Assistant by Anthropic | by Vaishnavi R | Version 1 | Medium
  15. Fine-Tuning Approaches: Determining the Best Fit for Your Project | by Helder Silva | Medium
  16. A Brief Overview of Hallucination in LLM | by Nut Chukamphaeng | SCB DataX | Oct, 2023 | Medium
  17. Understanding Retrieval-Augmented Generation: A Simple Guide | by Amod’s Notes | Medium
  18. AI Alignment, Explained in 5 Points | Medium
  19. Attempting to solve the AI Alignment Problem | Medium
  20. Everything About MISTRAL’S Mixtral-8x7B: The Best Open LLM | by Maya Akim | Dec, 2023 | Medium
  21. [Paper Summary] Overview of Google’s First Multimodal Model: Gemini | by Thomas Chong | Dec, 2023 | Medium
  22. Data Augmentation in Deep Learning | by Valentina Alto | Analytics Vidhya | Medium
  23. Active Learning (ultimate guide). [This blog is a compilation of… | by Farnaz Ghassemi | Medium
  24. Transfer Learning Explained. Our monthly analysis on machine… | by integrate.ai | the integrate.ai blog | Medium
  25. AI in Healthcare: Exploring Its Uses and Impact | by MediaLab | Dec, 2023 | Medium
  26. AI in Finance: The Good, the Bad, and the Ugly | by Its All About AI | Nov, 2023 | Medium
  27. AI in Education: The Future of Learning and Teaching | by Alex Northwood | Medium
  28. How Generative AI Is Changing Creative Work (hbr.org)
  29. Advanced Midjourney Guide. Use the zoom out function to create… | by William | ILLUMINATION’S MIRROR | Medium
  30. Dall E: This AI Can Illustrate Your Imagination | by Sudharshan Ravichandran | Geek Culture | Medium
  31. Q-learning: a value-based reinforcement learning algorithm | by Dhanoop Karunakaran | Intro to Artificial Intelligence | Medium
  32. Learn A* (A-star) Algorithm in Python — Code An AI to Play a Game | by Josiah Coad | Medium
  33. An introduction to Reinforcement Learning | by Thomas Simonini | We’ve moved to freeCodeCamp.org/news | Medium
  34. AI Search Algorithms With Examples | by Pawara Siriwardhane, UG | Nerd For Tech | Medium
Artificial Intelligence
Technology
ChatGPT
Writing
2024 Trends
Recommended from ReadMedium
avatarLouis-François Bouchard
Is OpenAI o1 Good?

o1, Strawberry, scam?

6 min read