REALM —📒Pre-Learning Language Model For Knowledge Extraction📒

REALM is a method for pre-training language models. With REALM from a pre-trained model, you can extract knowledge about the world directly from raw documents. In this case, the model does not need to store facts in scales. The essence of the approach is that the word representation model is trained together with the knowledge extraction model. Researchers have posted a project code for replicating experiments.

➡️Description of the problem

Recent advances in natural language processing have relied heavily on unsupervised pre-learning. The language model is trained on a large set of texts of general topics. In this case, no data markup is required. Pretrained models such as BERT or RoBERTa contain a substantial body of knowledge about the world that is drawn from the training sample. Knowledge coding is especially important for tasks such as question-answer systems, information extraction, and text generation. However, existing models store knowledge implicitly in scales. This complicates the verification of the presence of knowledge in the model and the process of finding it.

The researchers propose a pre-learning method that allows the knowledge of the model to be directly accessed without increasing the model’s size or complexity. For this, researchers use the reference corpus of texts.

⚙How REALM works

The standard approach for pre-learning is to fill in the missing words. However, the knowledge about the world that the model learns during pre-learning is abstract and cannot be directly accessed.

An alternative approach that researchers suggest is to additionally train the knowledge retriever model. Such a model first extracts text from an external dataset to give the language model more context to fill in the gaps. If the extracted information does not help the language model in filling in the gaps, then the extraction model is penalized. The knowledge extraction model is also trained with an objective function of filling in missing words. As an outside corpus, the researchers used the texts of the English Wikipedia in their experiments.

Image for post
Image for post

📊Approach performance evaluation

Researchers compared T5 pre-trained in the standard way and with REALM on an Open-QA task. Experiments show that the REALM model with 300 million parameters by 4 points bypasses the T5 with 11 billion parameters.

Image for post
Image for post

Before you go…

If you found this article helpful, click the💚 or 👏 button below or share the article on Facebook so your friends can benefit from it too.

Written by

Bioinformatician at Oncobox Inc. (@oncobox). Research Associate at Moscow Institute of Physics and Technology (@mipt_eng).

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store