Test Saagie in a few clicks with our interactive demo!

logo saagie red
illustration blog definition NLP

What is Natural Language Processing?

Have you ever wondered how your phone could possibly be able to understand what you are saying? Has this brainless pile of metal and plastic acquired the ability to talk with humans? If you already spend time playing with Siri, OK Google or Cortana, trying to fool them with some convoluted questions, you got an idea of what we call natural language processing (NLP).

For now, none has been able to create an artificial intelligence able to communicate as we do. We will go through some of the state-of-the-art artificial intelligence in NLP and what we can expect from the future.

Let’s dive into these innovations.

A Definition of Natural Language Processing

This field in computer science includes a lot of subject areas:

  • text classification,
  • text understanding,
  • speech recognition,
  • text-to-speech,
  • question answering,
  • summarization,
  • optical character recognition (OCR),

and so on.

As you certainly notice, NLP is borderless!

As humans, we speak since we are children. While you are reading those lines, you are not wondering about how your brain is making sense of all these characters.

Sit back a bit.

Let’s try to put on the computer’s skin. We are only able to process numbers. How can we even recognize a word? Or even more, understand a sentence?

This gives you a glimpse of the task’s difficulty.

Alan Turing, one of the artificial intelligence fathers, even suggested measuring a machine’s intelligence on its ability to pretend to be human.

This is the so-called Turing’s test.

Deep Learning and Natural Language Processing: a Great Combo

The deep learning breakthrough brought its new developments. Whereas we were confined to statistics and rule-based techniques, deep learning allows getting more abstractions. Remember, words are abstract for computers! Thanks to new discoveries, we are now able to convert words into vectors. In other words, we translate our languages into machine’s one. For decades, we used tricks to substitute words by numerical representation.

One of the Most Significant Progress: Word2Vec.

Thanks to neural networks, we are able to transform words from a corpus of text into vectors. Vectors are just a list of numbers… And computers love numbers! Not only, they love numbers, but the way those vectors are created embed relations between words in the corpus. Let’s say you have those words: queen, king, woman, and man. We can figure out the relationships between them. Even better! With Word2Vec, we can see those. Since words are vectors, we can do whatever it is mathematically possible to do with. That is to say; you can add and subtract them. And what happened when you do the following: king – man + woman? Bingo! You get queen. This is impressive, isn’t it? Word embedding is widely used as a pre-processing in many solutions. For example, if you want to classify texts, you may use Word2Vec to increase performances.

The Other Solutions

Word2Vec is not the only one. Since then, more solutions have emerged: GloVe, ELMo, BERT,… Each of these is pushing the processing machine’s capabilities further. The good thing is, if you want to have fun with it, a lot of open-source libraries already implement it. So you can quickly set up a model for your own purposes. Most of the problems you may solve in natural language processing will start by transforming a text into those machines digest vectors.

Natural Language Processing: a Limitless Future…

Today, NLP coverage several tasks as text understanding, topic classification, machine translation, and many others. Chatbots, intelligent speakers are just the tip of the iceberg.

Natural language processing can help you extract the information you may not be able to do on a massive amount of text data. It may help you report a problem quickly by reading the product’s comment. The options are limitless, and the future is full of even more significant improvements! For now, text generation has been one of the most challenging problems in the field. Recent advances of OpenAI company put a foot in the door of free speaking AI. You can try it here. Fun, right?

Despite the amazing progress, there are some limitations. For now, it requires a large amount of data. Most of the time, it comes from the internet, but the quality may not be there, and speaking on the internet is not similar to everyday life. It may bias the AI in some of its decisions. Training this kind of artificial intelligence is completely expensive. You will need big servers farm in order to reach good results.

Hopefully, things are evolving fast. Whatever your need in natural language processing, you may find a solution that fits. Speaking like you is still tough for your small smartphone or computer. Getting an artificial intelligence speaking and reading like you and me will take time. It is built brick by brick. I give the floor to this AI for the conclusion because I wouldn’t have said better: “[NLP] is a field that will continue to grow and adapt, and will continue to drive the field of machine learning to new frontiers.”