Concretely, deep learning is a learning technique that allows a computer program to recognize the content of an image or understand the spoken language – complex challenges, on which the artificial intelligence research community has developed. “Deep learning technology is learning to represent the world. That is to say how the machine will represent speech or image,” says Yann LeCun, considered by his peers as one of the most influential researchers in the field. “Before, you had to do it by hand, explain to the tool how to transform an image in order to classify it. With deep learning, the machine learns to do it itself. And it does that a lot better than engineers, it’s almost humiliating!”
How Does Deep Learning AI Work?
To understand deep learning, we must go back to supervised learning, a common AI technique that allows machines to learn. Specifically, for a program to learn to recognize a car, for example, it is “fed” tens of thousands of images of cars, labeled as such. There is essentially a “training” phase, which can take hours or even days. Once trained, it can recognize cars on new images.
Deep learning also uses supervised learning, but it is the internal architecture of the machine that is different: it is a “neural network”, a virtual machine composed of thousands of units (the neurons) where each unit performs simple small calculations. “The peculiarity is that the results of the first layer of neurons will be used as input to the calculation of others,” says Yann Ollivier, a researcher at the CNRS and a specialist of the subject. This “layered” operation is what makes this type of learning “deep”. Yann Ollivier gives a telling example.
“How to recognize a cat image? The highlights are the eyes and ears. How to recognize a cat’s ear? The angle is about 45°. To recognize the presence of a line, the first layer of neurons will compare the difference of the pixels above and below. This will give a level 1 characteristic. The second layer will work on these characteristics and combine them together. If there are two lines that meet at 45°, the algorithm will begin to recognize the triangle of the cat’s ear. And so on.”
At each stage – there can be up to twenty layers – the neural network deepens its understanding of the image with more and more precise concepts. To recognize a person, for example, the machine decomposes the image – first the face, the hair, the mouth, then it will go to increasingly fine properties, such as the mole. “With traditional methods, the machine just compares the pixels. Deep learning allows learning on features more abstract than pixel values, which it will build itself ” says Yann Ollivier.