Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

sumption of the model by 80% and increased its efficiency x11 when compared to 100-million language models.Extremely important news in line with <a href="https://arxiv.org/abs/1907.10597">green AI</a> principles and objectives.<h1 id="2600">Green AI to demonopolize large language models</h1>But they didn’t stop there and now, 5 months later, they’ve just achieved not one, but two new striking milestones: They’ve improved M6 to make it the first <a href="https://pandaily.com/alibaba-damo-academy-creates-worlds-largest-ai-pre-training-model-with-parameters-far-exceeding-google-and-microsoft/">10-trillion-parameter large language model</a> — 50x GPT-3’s size. And they’ve bettered their previous marks on efficiency, reducing the energy consumption to 1% of what GPT-3 needed to train.They used a mere 512 GPUs to train the model in 10 days!These achievements will have far-reaching positive consequences for the AI community and the world.On the one hand, it’s a big leap towards finding common ground between the necessities of large AI models and the requirements of clean energy movements that aim at reducing the carbon footprint. One of the main criticisms of large language models is that they can’t compensate for the huge amounts of pollution they generate. It’s been <a href="https://www.newscientist.com/article/2205779-creating-an-ai-can-be-five-times-worse-for-the-planet-than-a-car/#:~:text=New%20estimates%20suggest%20that%20the%20carbon%20footprint%20of%20training%20a%20single%20AI%20is%20as%20much%20as%20284%20tonnes%20of%20carbon%20dioxide%20equivalent%20%E2%80%93%20five%20times%20the%20lifetime%20emissions%20of%20an%20average%20car.">estimated</a> that training a large AI model (pre-GPT-3) contaminates 5 times more than a car in its entire lifetime — and their usefulness isn’t so obvious. Amazon and Microsoft, among other tech companies, have already presented plans to reduce carbon emissions in the coming years, but both aim to tackle the problem by cooling the data centers whereas Alibaba has achieved a better solution; reducing the resources needed to train the models.This has another important advantage. If Alibaba publishes the techniques and methods they’ve used to achieve its results, smaller players could enter into competition against the big tech corporations that are currently monopolizing the super-profitable field of large AI models.The cost of researching, training, and inference creates such a toll that even giants like Google have had problems funding the technology. DeepMind, a Google subsidiary, decided not to investigate different possibilities for a key component when creating AlphaStar to avoid surpassing the budget.OpenAI — which had access to a <a href="https://blogs.microsoft.com/ai/openai-azure-supercomputer/#:~:text=The%20supercomputer%20developed%20for%20OpenAI%20is%20a%20single%20system%20with%20more%20than%20285%2C000%20CPU%20cores%2C%2010%2C000%20GPUs%20and%20400%20gigabits%20per%20second%20of%20network%20connectivity%20for%20each%20GPU%20server.">10,000 Nv

Options

idia V100 supercomputer</a> provided by Microsoft (although it hasn’t been disclosed the exact amount of GPUs they used)— decided to not retrain GPT-3 after researchers found a mistake because it’d have been infeasible. Some gross calculations estimate a training cost of <a href="https://lambdalabs.com/blog/demystifying-gpt-3/#:~:text=The%20cost%20of%20AI%20is%20increasing%20exponentially.%20Training%20GPT-3%20would%20cost%20over%20%244.6M%20using%20a%20Tesla%20V100%20cloud%20instance.">at least $4.6 million</a>, which is out of reach for most companies — that’s without including research and development costs, which would elevate the number to <a href="https://bdtechtalks.com/2020/09/21/gpt-3-economy-business-model/#:~:text=This%20would%20put%20the%20cost%20of%20research%20and%20development%20between%20%2411.5%20million%20and%20%2427.6%20million%2C%20plus%20the%20overhead%20of%20parallel%20GPUs.">$ 10–30M</a>.How could smaller companies compete against that?In contrast, the latest version of M6 has been trained on 512 GPUs for 10 days. (GPT-3 was trained on V100, but researchers <a href="https://arxiv.org/pdf/2104.04473.pdf">calculated</a> that using A100s, it would have taken 1,024 GPUs to train the model in 34 days.)Doing some gross calculations we can compare the training cost for both models. I’ll assume Alibaba used Nvidia A100 and a similar cost of GPU instance/hour as AWS, where an 8-Nvidia A100 AWS instance costs ~ $20/hour. Given they used 512 GPUs, that makes 64 8-A100 instances. Doing the math we have the total cost = 64 #instances \cdot$ 20/hour · 24 hours/day · 10 days = $307,200.Still somewhat costly, but nowhere near what OpenAI spent to train GPT-3.<h1 id="124f">A silver lining for the future</h1>In the past, I’ve been very critical of large language models for reasons ranging from discrimination and biases to capacity for misinformation, to lack of understanding, and even because <a href="https://towardsdatascience.com/yet-another-largest-neural-network-but-why-f48d231972a9">why do we even need more large language models?</a> And also because of the high environmental and financial costs creating these systems entails.But today I applaud the results Alibaba DAMO Academy has published.It seems they’re committed to improving at least some of the problems this new AI trend carries. There’s still a lot of work to do — and some of the issues are so intrinsic to these models that we can only hope to mitigate them — but seeing big tech companies aiming to improve the current landscape is a silver lining for the near-term future of artificial intelligence.If you liked this article, consider subscribing to my free weekly newsletter <a href="https://mindsoftomorrow.ck.page/">Minds of Tomorrow</a>! News, research, and insights on Artificial Intelligence every week!You can also support my work directly and get unlimited access by becoming a Medium member using my referral link <a href="https://albertoromgar.medium.com/membership">here</a>! :)</article></body>

Meet M6 — 10 Trillion Parameters at 1% GPT-3’s Energy Cost

Smaller players can now enter the game of large AI models

I can confidently say artificial intelligence is advancing fast when a neural network 50 times larger than another can be trained at a 100 times less energy cost — with just one year in between!

On June 25, Alibaba DAMO Academy (the R&D branch of Alibaba) announced they had built M6, a large multimodal, multitasking language model with 1 trillion parameters — already 5x GPT-3’s size, which serves as the standard to measure the rate of progress for large AI models. The model was intended for multimodality and multitasking, going a step further than previous models towards general intelligence.

In terms of abilities, M6 resembles GPT-3 and other similar models like Wu Dao 2.0 or MT-NGL 530B (from which we have very little information). InfoQ, a popular Chinese tech magazine compiles M6’s main skills: “[It] has cognition and creativity beyond traditional AI, is good at drawing, writing, question and answer, and has broad application prospects in many fields such as e-commerce, manufacturing, literature and art.”

However, the critical aspect Alibaba researchers highlighted was the significant efficiency and energy cost improvements. They reduce the consumption of the model by 80% and increased its efficiency x11 when compared to 100-million language models.

Extremely important news in line with green AI principles and objectives.

Green AI to demonopolize large language models

But they didn’t stop there and now, 5 months later, they’ve just achieved not one, but two new striking milestones: They’ve improved M6 to make it the first 10-trillion-parameter large language model — 50x GPT-3’s size. And they’ve bettered their previous marks on efficiency, reducing the energy consumption to 1% of what GPT-3 needed to train.

They used a mere 512 GPUs to train the model in 10 days!

These achievements will have far-reaching positive consequences for the AI community and the world.

On the one hand, it’s a big leap towards finding common ground between the necessities of large AI models and the requirements of clean energy movements that aim at reducing the carbon footprint. One of the main criticisms of large language models is that they can’t compensate for the huge amounts of pollution they generate. It’s been estimated that training a large AI model (pre-GPT-3) contaminates 5 times more than a car in its entire lifetime — and their usefulness isn’t so obvious. Amazon and Microsoft, among other tech companies, have already presented plans to reduce carbon emissions in the coming years, but both aim to tackle the problem by cooling the data centers whereas Alibaba has achieved a better solution; reducing the resources needed to train the models.

This has another important advantage. If Alibaba publishes the techniques and methods they’ve used to achieve its results, smaller players could enter into competition against the big tech corporations that are currently monopolizing the super-profitable field of large AI models.

The cost of researching, training, and inference creates such a toll that even giants like Google have had problems funding the technology. DeepMind, a Google subsidiary, decided not to investigate different possibilities for a key component when creating AlphaStar to avoid surpassing the budget.

OpenAI — which had access to a 10,000 Nvidia V100 supercomputer provided by Microsoft (although it hasn’t been disclosed the exact amount of GPUs they used)— decided to not retrain GPT-3 after researchers found a mistake because it’d have been infeasible. Some gross calculations estimate a training cost of at least $4.6 million, which is out of reach for most companies — that’s without including research and development costs, which would elevate the number to $10–30M.

How could smaller companies compete against that?

In contrast, the latest version of M6 has been trained on 512 GPUs for 10 days. (GPT-3 was trained on V100, but researchers calculated that using A100s, it would have taken 1,024 GPUs to train the model in 34 days.)

Doing some gross calculations we can compare the training cost for both models. I’ll assume Alibaba used Nvidia A100 and a similar cost of GPU instance/hour as AWS, where an 8-Nvidia A100 AWS instance costs ~$20/hour. Given they used 512 GPUs, that makes 64 8-A100 instances. Doing the math we have the total cost = 64 #instances · $20/hour · 24 hours/day · 10 days = $307,200.

Still somewhat costly, but nowhere near what OpenAI spent to train GPT-3.

A silver lining for the future

In the past, I’ve been very critical of large language models for reasons ranging from discrimination and biases to capacity for misinformation, to lack of understanding, and even because why do we even need more large language models? And also because of the high environmental and financial costs creating these systems entails.

But today I applaud the results Alibaba DAMO Academy has published.

It seems they’re committed to improving at least some of the problems this new AI trend carries. There’s still a lot of work to do — and some of the issues are so intrinsic to these models that we can only hope to mitigate them — but seeing big tech companies aiming to improve the current landscape is a silver lining for the near-term future of artificial intelligence.

If you liked this article, consider subscribing to my free weekly newsletter Minds of Tomorrow! News, research, and insights on Artificial Intelligence every week!

You can also support my work directly and get unlimited access by becoming a Medium member using my referral link here! :)