How to make an integration of RASA + BERT in Spanish

Santiagopinzon
9 min readMar 31, 2021

Since the release of Rasa Open Source 1.8.0, we can now use embedding of previously entered models such as Bert, but what’s the boom? In this blog, you will learn how to make an integration of rasa + Bert in Spanish and key concepts defined in the best way so you don’t get lost in the attempt.

All this knowledge was acquired in a project done at Holberton School with the mentoring of the Vozy company.

Let’s start as it should be at the beginning.

What is rasa?

Rasa is an open-source machine learning framework for creating artificial intelligence assistants and chatbots.

Here is how rasa is defined in its platform.

With Rasa, all teams can create personalized, automated customer interactions at scale. Rasa provides the infrastructure and tools needed to create best-in-class assistants that significantly transform the way customers communicate with companies.

So why choose rasa and not another, better-known platform?

Why rasa?

Chatbots can generally only handle basic questions or frequently asked questions. Contextual assistants are the next level and allow developers to automate conversations from one end to the other and this can already be done in rasa, later on, we will see key components where we will capture that crucial part which is the context.

When we started in this very big world of machine learning and AI we found that studies are constantly being done on how to improve NLU (Natural language processing) and NLG (Natural-language generation) and what are they?

What is NLU?

Natural language understanding (NLU) is a branch of artificial intelligence (AI) that uses computer software to understand input made in the form of sentences in text or speech format.

NLU directly enables human-computer interaction ( HCI ).

What is NLG?

Natural language generation (NLG) is the use of artificial intelligence (AI) programming to produce written or spoken narratives from a data set. NLG is related to computational linguistics, natural language processing (NLP), and natural language understanding (NLU), the areas of AI concerned with human-to-machine and machine-to-human interaction.

All this is encompassed by NLP (Neuro-linguistic programming) that in other words learning NLP is like learning the language of your mind! and NLU and NLG are methods for this analysis.

Why have clear these concepts? when we handle rasa we realize there is a file called config.yml and this file contains two pipelines, the first is about rasa NLU and the second which are the policies is the part of rasa NLG which is the dialogue management and what we were explaining a moment ago, with rasa NLU we focus on the understanding and comprehension of the message we are receiving and with rasa NLG we focus on the narrative or response that will give the bot based on the analysis that was done in rasa NLU.

What is a pipeline?

When we make our first command with rasa which is rasa init this creates a small bot that has as domain the humor of a person, if we see the config.yml file it looks something like this:

In the first part, we see each component of the pipeline such as WhiteSpaceTokenizer, RegexFeaturizer, LexicalSintaticFeaturizer, and others, in this blog I will not go into detail on each component if you want to know a little more about this I leave this link to the official documentation.

When we talk about rasa NLU we already commented that it is the understanding of the message but for a machine to understand the text we have to translate this to a language that understands it, each component of the pipeline makes a process to polish the text and at the end have a structured text.

Before continuing in rasa we work on two important concepts which are intentions and entities, this is used in the NLU to capture the intention of the sentence that I am receiving and entities which are elements in a sentence that can change, we will see this in more detail in this example.

In this example we have this sentence:

This is how we receive it or it receives it as it is, when passing through the first component that is WhiteSpaceTokenizer it looks like this:

and this was the first component now the following one is RegexFeaturizer which is in charge of extracting elements of that sentence with regular expressions, and let’s move on to CountVectorFeaturizer that this component, passes words to numbers and then analyzes with numbers and not strings, the component that I am most interested in explaining is the DietClassifier that is where we will capture the part of the context and where the transformer networks are integrated which are another way of analyzing that text.

Let’s look at this graph and see this complex but it is not so, this graph we have the analysis of a sentence that is ‘Play ping pong’ when this sentence comes to this component is divided into several parts first the Dual Intent and Entity Transformer (DIET) this unlike other components is bidirectional that means this analyzes the previous word of a sentence and the next to answer, In this graph, each word has its own process ‘Play’ to these words Sparse features are extracted and these are embedded in an artificial network, the same happens with ‘ping’ and ‘pong’.

This component can do a process called masking which consists of taking a random word and masks it and tries to predict it with the training data we have. This process is very beneficial and can help you to give better results in your rasa model.

Each component that creates this artificial network also receives word embeddings of a pre-trained model as it can be BERT and at the end, a concatenation of the values that we have in the artificial network and the embeddings of the pre-trained model is made, this passes to the transformer that this is two layers through which passes these embeddings that we receive from the concatenation, where he analyzes the previous word and the following word to give a confidence of the intention that he believes that they are the most successful and the same thing happens with the entities, this of the entities is defined in files of rasa but we are not going to enter to speak much of this. the DIET output looks like this:

It returns the name of the intention that captured the confidence it obtained and the entities it captured with their respective value.

This component is a bit recent in rasa and is commonly used a lot because of this it brings us that it can bring out both the intentions and the entities.

This was the main part of rasa NLU where it went through some steps the text and analyzed it, now it receives the policies that is the part that already decides what to answer depending on what it received in rasa NLU, let’s keep in mind that each component is passing its output to the next one and so on.

The policies have other components but these are focused on the part of NLG or dialogue management in rasa, components like MemoizationPolicy this is in charge of checking if the text that I received the intention is in some story that I have in my stories.yml file if it is not so it passes to the next component that is TEDPolicy that this integrates machine learning algorithms to detect the best answer is not always so accurate, but well let’s say that he found an answer and this is what is returned to the user.

Ready, we already understood a normal flow of a rasa model, now how do I introduce Bert in rasa?

What is Bert?

Bert is a previously trained model developed by google there are two models of Bert which are Bert base and Bert large the difference between these two is the number of encoders each one has and each one was trained BooksCorpus with 800M words and English Wikipedia with 2,500M words.

Bert also uses a transformer to get a context of the sentence, with these models you can do fine-tuning which is to make the model work for specific tasks, I recommend this blog if you want to know more about Bert.

How can Bert help us?

Bert has multiple models such as Beto, this model was the one I used in my project and the one I am going to focus on since Beto is a functional model for Spanish, it brings all those word embeddings and joins them with our training data helping our bot to understand much more the sentence it is entering.

There is a component that takes care of loading the previously entered models:

The HFTransformersNLP component is responsible for loading the models that these models are found in hugginface which is a platform where all these models are stored in the above example is loaded directly from huggingface the model if you do not want it that way you can download the model and put a path to your local:

Knowing how to add the Bert model to our pipeline, how would it be complete?

As we can see the DietClassifier component adds a lot more attributes to it to take full advantage of the previously embedded model. This is used a lot when your training data are few and you need to improve your results.

Conclusions

At this point, you can successfully perform integration of any model in rasa, and you can have results, if you make a comparison of a model without Bert and a model with Bert you will see that if you notice changes, I advise you to understand very well each component of the pipeline and adapt each attribute as best as possible to your case.

Next, I will specify some aspects that I went through to obtain the result of this project. This project was to make integrate rasa + Bert and expose this in an API.

Architecture flow

A text was received by an API made in Django and this consumed the rasa API which is two endpoints one is /model/parse which is only for the rasa NLU part and /webhooks/rest/webhook which consists of multiple calls and is used for the conversation part, because can you imagine how to make a conversation by an API?

Here is a video explaining how to use the API with Postman

The Biggest Challenge

I think the biggest challenge was to introduce us quickly to machine learning concepts, to understand what NLP, NLU, and NLG were, to try to analyze metrics using libraries like Numpy and pandas, to understand what a confusion matrix was and other key concepts to make this project, although the rasa platform sells that you can make a bot without understanding a little machine learning the truth is that it is not entirely true if you want to improve your results you must open that black box and start reading and documenting yourself to be able to do something in professional terms.

About of me

I am a very disciplined software developer passionate about learning new things, one of the skills by which I stand out is that I am a person who is always willing to help others for that very reason if you want to contact me here I leave my social networks

Twitter: santiagopinzonD

Github: santiagoPinzonD

Linkedin: santiagopinzond

Here you will find the repository where this project is located https://gitlab.com/holbiezencode/rasbert-bot

Members of this project:

--

--