avatarAI TutorMaster

Summary

Sakana AI researchers in Tokyo have developed an evolutionary optimization technique for merging large language models (LLMs) to create more powerful and efficient AI models with cross-domain capabilities.

Abstract

Researchers at Sakana AI in Tokyo, Japan, are pioneering a novel platform for combining open-source AI models into potent foundation models. Utilizing evolutionary algorithms, they have optimized the process of model merging, enabling the creation of AI with superior performance without the need for extensive additional training data or compute resources. This approach has led to the development of two state-of-the-art models: EvoLLM-JP, which excels in Japanese language tasks and math reasoning, and EvoVLM-JP, a Japanese Vision-Language Model with advanced cross-domain capabilities. The method involves optimizing both the parameter space (PS) and data flow space (DFS) to discover novel model architectures that surpass the limitations of human intuition in model design.

Opinions

  • The authors believe that the current model merging techniques are limited by human intuition and domain knowledge, which has hindered the full potential of AI model combinations.
  • They posit that evolutionary optimization is a key solution to automate the discovery of optimal model merging configurations, potentially surpassing the capabilities of any single source model.
  • The researchers are confident in the versatility and effectiveness of their evolutionary approach, as demonstrated by the impressive results of their EvoLLM-JP and EvoVLM-JP models on various benchmarks.
  • The text suggests that the Sakana AI team's methodology could democratize model development and foster innovation within the AI community by enabling the creation of powerful models with diverse capabilities.
  • The article implies that the evolutionary merging approach is architecture-agnostic and could be applied to other model architectures beyond transformer-based ones.
  • The authors advocate for the cost-effectiveness of their approach, emphasizing its ability to create powerful models without the need for extensive compute resources or training data.

Model Merging: Next-Gen LLM Models by Combining other LLM’s

Evolutionary algorithms optimize model combinations for breakthrough AI performance

Source- here

Introduction

Imagine a world where the vast ocean of open-source AI models could be seamlessly combined, their collective intelligence harnessed to create powerful new foundation models with capabilities far beyond what any individual model could achieve. This is the exciting new platform that researchers at Sakana AI in Tokyo, Japan are pioneering with their groundbreaking work on evolutionary optimization of model merging recipes.

The Emergence of Model Merging

In recent years, model merging has emerged as a promising new paradigm in the development of large language models (LLMs). By strategically combining multiple pre-trained LLMs into a unified architecture, model merging enables the creation of powerful new models without the need for extensive additional training data or compute resources. This highly cost-effective approach has fueled a surge of interest and experimentation in the AI community.

Created by Bing

However, up to this point, model merging has been somewhat of a “black box,” relying heavily on the intuition and domain knowledge of the model developer to discover effective combinations and merging recipes. With the vast and growing diversity of open-source models available, the full potential of model merging has been limited by the bounds of human intuition. That’s where evolutionary optimization comes in.

Researchers at Sakana AI in Tokyo, Japan have introduced a novel evolutionary approach for optimizing model merging recipes to automatically create powerful foundation models, without the need for extensive additional training data or compute resources.

The Power of Model Merging

Model merging has emerged as a promising and cost-effective approach for developing large language models (LLMs). By strategically combining multiple pre-trained models into a unified architecture, model merging allows developers to harness the strengths of individual models without the computational burden of training from scratch.

However, current model merging techniques heavily rely on human intuition and domain knowledge, limiting their potential. The Sakana AI team recognized this limitation and set out to develop an automated, systematic approach for discovering optimal model merging configurations.

Created by Bing

Evolutionary Optimization:

The Key to Automated Model Composition The core of Sakana AI’s approach lies in the application of evolutionary algorithms to navigate the vast search space of potential model combinations. By optimizing merging configurations in both the parameter space (weights of individual models) and data flow space (inference path through the network), their method efficiently discovers powerful merged models that exceed the capabilities of any single source model.

This evolutionary optimization operates by iteratively proposing, evaluating, and refining candidate model merging solutions. The fitness of each candidate is assessed based on its performance on user-specified tasks, allowing the algorithm to automatically identify configurations that yield the best results. Through this process, the method can uncover novel and unintuitive model combinations that may be difficult for human experts to identify.

Created by Bing

Methodology:

The core idea is to use evolutionary algorithms to automate the process of discovering optimal ways to combine diverse pre-trained models into a unified architecture. This approach operates in two key dimensions:

  1. Parameter Space (PS): This involves evolving the optimal weights for mixing the parameters of the source models at each layer. The researchers used an enhanced version of the TIES-Merging technique, which addresses issues like redundant parameter values and conflicting parameter signs. They applied evolution to find the best configuration of sparsification and weight mixing parameters for each layer.
  2. Data Flow Space (DFS): Here, evolution is used to optimize the inference path that data takes through the layers of the merged model. Instead of combining parameters, this approach keeps the original source model parameters intact and evolves a sequence of layer indices that defines the route data should follow. This allows for architectures that go beyond simple sequential stacking of layers.

The researchers also developed a combined approach that first applies PS merging to create an intermediate model and then includes this model in the pool for DFS merging. This hybrid approach enables the creation of models optimized for multiple objectives.

Source-here

Architecture:

The architecture of the merged models is determined by the evolutionary process. In the PS approach, the merged model has the same architecture as the source models, but with each layer consisting of a weighted mix of the corresponding layers from the source models.

Source-here

In the DFS approach, the architecture is essentially evolved, with the inference path determining the sequence of layers from different source models that data flows through. This can result in non-sequential architectures that would be difficult to conceive of manually.

For their experiments, the researchers used a transformer-based architecture, as the source models (a Japanese LLM, English math models, and an English vision-language model) were all based on the transformer architecture. However, the evolutionary merging approach is agnostic to the specific architecture and could potentially be applied to other architectures as well.

Impressive Results:

State-of-the-Art Performance and Cross-Domain Capabilities To showcase the effectiveness of their approach, the researchers at Sakana AI applied evolutionary optimization to create two impressive foundation models: a Japanese LLM with math reasoning capabilities (EvoLLM-JP) and a Japanese Vision-Language Model (EvoVLM-JP).

Remarkably, both of these models achieved state-of-the-art performance on various benchmarks, despite not being explicitly optimized for those tasks. EvoLLM-JP surpassed the performance of some 70B parameter Japanese LLMs on the JP-LMEH benchmark suite, highlighting its efficiency and generalization ability. Similarly, EvoVLM-JP demonstrated superior performance in handling culturally-specific content, outperforming previous Japanese VLMs.

These results underscore the power of evolutionary optimization in discovering synergistic combinations of models from different domains. By effectively merging a Japanese LLM with an English math model and an English vision-language model, respectively, the method generated models with novel cross-domain capabilities that would be challenging to achieve through manual design.

Ability to merge models from different domains opens up exciting possibilities for creating foundation models with novel capabilities. By combining models trained on diverse tasks and modalities, this approach can give rise to models that exhibit cross-domain understanding and reasoning abilities.

Created by Gemini

Models

EvoLLM-JP-v1–7B

EvoLLM-JP-v1–10B

EvoLLM-JP-A-v1–7B

EvoVLM-JP-v1–7B

Comparing EvoLLM-JP w/ Source LLMs

Reproducing the Evaluation

1. Clone the Repo

git clone https://github.com/SakanaAI/evolutionary-model-merge.git
cd evolutionary-model-merge

2. Download fastext Model

Authour use fastText to detect language for evaluation. Please download lid.176.ftz from this link and place it in your current directory. If you place the file in a directory other than the current directory, specify the path to the file using the LID176FTZ_PATH environment variable.

3. Install Libraries

pip install -e .

Python Version 3.10.12 and CUDA Version 12.3.

4. Run

To launch evaluation, run the following script with a certain config. All configs used for the paper are in configs.

python evaluate.py --config_path {path-to-config}

Or, try in your Jupitor Notebook

import torch
from transformers import AutoModelForVision2Seq, AutoProcessor

device = "cuda" if torch.cuda.is_available() else "cpu"
model_id = "SakanaAI/EvoVLM-v1-JP-7B"
max_length = 128

if device == "cpu":
    model = None
else:
    model = AutoModelForVision2Seq.from_pretrained(
        model_id, torch_dtype=torch.float16, low_cpu_mem_usage=True,
    )
    
processor = AutoProcessor.from_pretrained(model_id)
model = model.to(device)

def inference_fn(image_path, prompt):
    text = f"<image>\n{prompt}"
    if model is None:
        return f"THIS IS FOR TEST!!\n{text}"
    
    messages = [
        {"role": "system", "content": "You are a helpful, unbiased, uncensored assistant. Answer the questions using the given image below."},
        {"role": "user", "content": text},
    ]
    
    image = image_path  # Assuming image_path is the path to the image file
    inputs = processor.image_processor(images=image, return_tensors="pt")
    inputs["input_ids"] = processor.tokenizer.apply_chat_template(
        messages, return_tensors="pt"
    )
    
    output_ids = model.generate(
        **inputs.to(device),
        do_sample=False,
        num_beams=5,
        max_new_tokens=max_length,
        repetition_penalty=1.5
    )
    output_ids = output_ids[:, inputs.input_ids.shape[1]:]
    generated_text = processor.batch_decode(output_ids, skip_special_tokens=True)[0].strip()
    
    return generated_text

# Example usage
image_path = "path/to/your/image.jpg"  # Replace with the actual path to your image
prompt = "Explain this photo to me"#この写真について説明してください

output = inference_fn(image_path, prompt)
print(output)

Conclusion

The groundbreaking research by Sakana AI on evolutionary optimization of model merging recipes represents a significant leap forward in the field of foundation model development. By leveraging evolutionary algorithms to automatically discover optimal combinations of diverse open-source models, their approach enables the creation of powerful AI models with state-of-the-art performance and cross-domain capabilities.

The efficiency and generalizability demonstrated by the evolved models underscore the immense potential of this method in democratizing model development and fostering innovation. As the AI community builds upon this work and further explores the possibilities of evolutionary optimization, we can expect to see even more impressive advancements in the creation of foundation models that push the boundaries of artificial intelligence.

Created by Bing
Merge Llm
Large Language Models
Multimodal
Artificial Intelligence
Chatbots
Recommended from ReadMedium