Language models in natural language processing
What is the Language Model?
In its simplest form, the language model is a probability distribution over words and phrases. The language model tries to make sense of the text data in a context. Estimating the relative probability of different expressions is useful in many natural language processing applications. Language models are used in speech-to-text, machine translation, part-of-speech tagging, Optical Character Recognition, handwriting recognition, text classification, and many other applications.
Language Model Types
- Simple Probabilistic Language Models
These language models are created simply by calculating n-gram probabilities (n-grams are strings of n words). The probability of an n-gram is the conditional probability that the last word of the n-gram follows a given n-1 gram. This approach has some disadvantages. Only the previous n words affect the probability distribution of the next word. But this is not always the case. In complex texts, the next word can also have an impact on meaning. Therefore, what the next word is may not be predicted from the previous n-words, even if n is 20 or 50.
- Neural Network Based Language Models
In Neural Network-based methods, a word embedding is applied for words or phrases. In this method, there is a vector of X size corresponding to each word or phrase. Since these vectors are numerically in a continuous space, they can make more accurate contributions to the probability distribution of the next word.
- RNN
RNNs are used when processing sequential data. Each word here is given to RNN in turn and a vector is obtained as output. This obtained vector and the vector of the next word are given as input to the RNN again and this is repeated until the end of the sentence. But RNNs still don't offer strong enough contextualization results as they only look at previous words. At the same time, they cannot use the hardware very optimized because it cannot be parallelized while training. Transformers have emerged to solve these problems.
- Transformers
The most popular and successful language models today are built on transformers. Google's BERT model, OpenAI's GPT-3 are the most popular examples. In addition, these models try to make sense of which inputs are related to each other and which inputs are important or unimportant, using a mechanism called "Attention". At the same time, it can parallelize and use the hardware more optimized because it takes all the inputs at the same time and does not need the previous vector like RNN.
- References
https://towardsdatascience.com/the-beginners-guide-to-language-models-aa47165b57f9