top of page
Search

Nvidia Making Large Language Models Available For Enterprises

  • liblabim
  • Nov 22, 2021
  • 4 min read

Do you know about Nvidia and its involvement in making large language models available for enterprises? If not, then no worries; in this blog, we will answer all the Nvidia working practices. We have already read about the M3 model in the previous blog. If you continue to read in series, you will understand the concept behind Nvidia and its practices for making large language models available for big companies. After the fall of GPU technology Today, Nvidia announced that they are making Megatron 530B, considered the world's largest language model, available for enterprises for training to serve the new domains and different languages.


Megatron 530MB, also known as Megatron - Turing Natural Language Generation (MT-NLP), contains 530 billion parameters and has already achieved a high accuracy goal in a broad set of natural language tasks. The tasks include reading comprehension, commonsense reasoning, and natural language processing.


Moreover, in a conference, the VP of Nvidia quoted that,


"Our researchers [worked] together with Microsoft [to train] the Megatron 530B model in six weeks" "Today, we provide recipes for customers to build, train, and customize large language models, including Megatron 530B. This includes scripts, code, and 530B untrained model. Customers can start from smaller models and scale up to larger models as they see fit."


Clear much? Let's move further and discuss Megatron 530B and how it will be the best large language model in the future.


Megatron 530B - A Look Inside The Semantics


If we talk about machine learning, the parameters are a significant part of the model and are learned from historical training data sets. In the language domain, the correlation between the number of parameters and clearance is held remarkably well. The extensive language model with many parameters, data sets, and more training time has been shown to acquire a richer and more nuanced understanding.


For instance, the ability to summarize books and complete the programming code. Moreover, to train the Nvidia Megatron 530B in partnership with Microsoft, they have created a training dataset with 270 billion tokens taken from various English language websites. The tokens are a way of separating pieces of text and turning them into smaller units in NL, like words, characters, or phrases. Megatron 530B has to train the dataset by ingesting the set of examples learning the patterns amongst the data points like basic syntax errors and grammatical rules.


The Megatron Dataset Characteristics


The datasets of Megatron are mainly from The Pile, an 835GB collection of 22 smaller datasets created by the open-source AI research efforts ElutherAI. The Pile itself spans academic sources, communities, code repositories, and many more related terms. Microsoft and Nvidia are curated and combined with filtered snaps of the Common Crawl, creating an extensive collection of web pages, including news stories and social media posts.


Nvidia said Megatron 530B could be inferred as an essential mathematical operation even when the symbols are "badly obfuscated."


"Customers are eager to invest in large language models for their capabilities on generalized AI with few-shot learning and the ability to excel in many tasks at the same time," Kari said. When it comes to conversational AI, this general approach is inspiring for use cases like open domain chatbots, document summarization, text generation, and so on … Megatron 530B is being used internally by Nvidia."


The Training And Usage Challenges Faced For Megatron 530B


With usage, different mechanisms affect the productivity of any model. The training for any language model is not an easy task. Talking about the Megatron 530B, the training and deployment process is not a piece of cake! Even big enterprises with massive resources face issues, and they ultimately have to think of some alternatives. The Megatron 530B model was initially trained across 560 Nvidia DGX A100 servers, and each one of them hosted 8 Nvidia A100 GPUs. Both Microsoft and Nvidia said that they observed between 113 to126 teraflops per second for each GPU. While training the model, the cost will be in millions! You can now imagine the massive investment they have to make to get the work done!


Nvidia is pitching its DGX SuperPOD as the best solution for training. A single line of servers and total workstations would be proficient for SuperPODs preconfigured DGX A100 systems built using A100 GPUs and Nvidia Mellanox InfiniBand for computation and storage. You know a single SyperPOD would cost anywhere from 7 million dollars to 60 million dollars, which is enormous! Megatron 530B was trained on the Nvidia Selene supercomputer and still requires a lot of investment, and the company has to bear the cost if they want to use that model. Tech giants, Google parent company Alphabet, ran against the budget and said it's super expensive! Perhaps that's why few companies made up for the training costs of large language models and made them available to the customers via API.


Biases Of Megatron 530B - A Issues Reported

After reading the previous two sections, we're pretty transparent with what Megatron is, its cost, and the budget one has to bear for the training and deployment. However, a well-established model like Megatron can amplify the biases in the data set on which they have trained, and indeed Microsoft and Nvidia both acknowledge the model, 'picks up stereotypes and biases from the trained data set". The parties involved, Nvidia and Microsoft, are committed to working on the addressed problem and continuing their research to help solve the language model's biases. Now you can imagine that even such substantial language models face issues like bias and toxicity that can not go unnoticed. Still, two other giants, Microsoft and Nvidia, are researching and solving the reported problem.


Wrapping Up - Megatron 530B


We hope you liked reading this technical blog, covering all the aspects of a Megatron 530B, the training cost, and the considerable biases. We're still surprised to know that tech giants like Microsoft and Nvidia are still working on solving the toxic issue reported by the people. However, we hope the problem will be solved in the large language model, and it will be available free of any toxicity or biases. After reading this short blog, feel free to post them in the comment section if you have any questions.

 
 
 

Comments


Post: Blog2_Post
bottom of page