The goal is use BERT via code in Python Language
-
BERT (Bidirectional Encoder Representations from Transformers): It’s a specific model based on the Transformer architecture, but it uses only the Encoder part of the Transformer. It was designed by the Google team in 2018 to understand the context of words in a text bidirectionally (i.e., looking at both what comes before and after a word). It is pre-trained on large amounts of text and then fine-tuned for specific tasks like question answering or sentiment analysis.
-
In summary: BERT is a specialized model based on the Transformer, focused on understanding the meaning of text.
- Hugging Face: https://huggingface.co/
- Language Python: https://www.python.org/
- Page: https://pt.wikipedia.org/wiki/Microsoft_Office
- Library Python:
- transformers beautifulsoup4 requests
- openai
- In my case, I used
Google Colab
how platform to write the script. But, you can use anyone of the preference.
1 - First, istall library: transformers beautifulsoup4 requests and openai
2 - After, declaring libraries:
- Library to request document from Web
- Operate with Berts Models
3 - Function to prepare our Dataset:
Only need pass URL, where join in the visible text separete for space.
Ideal use this tools in situation that I need to take out doubt in chatbox.
In this case, during build of the code, include Model "BERT":
In case above, it was used "bert-large-uncased-whole-word-masking-finetuned-squad" model, case you want use other specific model. But, where pick up models:
Remembering that Dataset is about Windows history, so the question need about it:
Ideal use this tools in situation that I need summary big text in small text.
Reinforcing, summary create here, is with base on the dataset created with data: https://pt.wikipedia.org/wiki/Microsoft_Office
On the code above, realize it has variable, so, when I know to need using? On the site: "https://huggingface.co/" in "Model". Select a model and left side there is explication how use. With necessary code.
Ideal use this tools in opinions, reviews situations, because it rate in "Positive", "Neutral", "Negative"
Example below, we evaluate the sentence directly: "Microsoft is a fantastic company, which contains wonderful products that make our daily lives easier."
- Modelo Usado:
- The "nlptown/bert-base-multilingual-uncased-sentiment" model is a BERT variant trained for multilingual sentiment analysis. It returns the probability of different sentiment levels (from 1 to 5 stars).
- 1-2 stars: “Negative”.
- 3 stars: “Neutral”.
- 4-5 stars: “Positive”.
- The "nlptown/bert-base-multilingual-uncased-sentiment" model is a BERT variant trained for multilingual sentiment analysis. It returns the probability of different sentiment levels (from 1 to 5 stars).