avatarYogesh Haribhau Kulkarni (PhD)

Summarize

Transformation by Hugging Face

(Image source: Pixabay)

Are you lost in the storm of these BERTs ie ALBERT, DistilBERT, RoBERTa etc? And these GPTs (1–2–3)? Don’t understand how they work? What they do? How to use? Worry not. Try Hugging (a) Face and Smile(y)!!

Software Engineers know (or are supposed to know) Github, similarly, Data Scientist are now supposed to know, Hugging Face.

It’s an open-source repository of Machine Learning models, with an easy interface to deploy and use. All that, for free!!

Here are a few quick examples, ‘Hello World’ for learning the ‘Hugging Face’ way:

Say, for the following text,

text = """Hey Jack, I just wanted to flag something with you. Last week when you
said that you didn't want to watch the move Twilight with me, even in jest, it kind of got under my skin. I mainly feel like it's hypocritical when you make me watch basketball games with you and our main activity together is watching sports on TV. I just wanted to get it off my chest. From Sophie""" 

lets do sentiment analysis:

from transformers import pipelin

classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")

outputs = classifier(text)

print(outputs)

The prediction comes out as ‘Negative (0.912)’!! Great!!

For the same text, NER (Named-entity recognition):

ner_tagger = pipeline("ner", aggregation_strategy="simple", 
                      model="dbmdz/bert-large-cased-finetuned-conll03-english")

outputs = ner_tagger(text)

print(outputs) 

The output is:

'Jack': PER: 0.98

'Twilight': MISC: 0.996

'Sophie': PER: 0.6149

And on the same text, here is Question Answering:

reader = pipeline("question-answering", 
                 model="distilbert-base-cased-distilled-squad"

question = "What movie did Jack not watch?"

outputs = reader(question=question, context=text)

print(outputs)

The answer given is ‘Twilight (0.9831)’.

Isn’t it amazing?

Today, Hugging Face platform offers more than 100,000 pre-trained models and 10,000 datasets for NLP, computer vision, speech, time-series, biology, reinforcement learning, chemistry and more.

Hall of Fame for the platform is its ‘transformer’ library, a collection of models based on Transformer architecture introduced by Google (remember ‘Attention is all you need’?) and furthered by many, like OpenAI. Transformers have taken NLP to the next level. But the problem is that training of these humongous models is not possible for mere mortals but only the giants who can run deep networks and have deep pockets. You know, who!! What would humble Machine Learning engineer do then? Here comes Hugging Face.

(Image Source: Syncedreview)

If for a user, who does not know programming, s/he can use the platform in an inference mode, via Web App/call. If you are a Data Science programmer, you can call these ready models from code and can also build custom models on top using your own data. If you are a Data Science researcher, you can create new networks/models and upload them for the remaining world to use. Hugging Face is ‘ML for All’. While it provides free/open access of ML, it is also striving to make it ‘Good’ (ie Ethical), trying to avoid models based on biased data.

A new initiative by Hugging Face is BigScience, a community of 1000s of researchers from different disciplines, to build the world’s largest open-source multilingual language model.

Give it a try and let me know your comments…

References

  1. Hugging Face — A Step Towards Democratization (link)
  2. Introduction to Transformers and Hugging Face (link)
  3. A complete Hugging Face tutorial (link)
  4. The official website (link)

(Also published at LinkedIn)

Artificial Intelligence
Naturallanguageprocessing
Science
Technology
Business
Recommended from ReadMedium