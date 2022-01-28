When we hear a sound like a train whistle, the sound waves reach our right and left ears at slightly different times and intensities, depending on the direction the sound is coming from. Parts of the midbrain are specialized to compare these slight differences to help estimate which direction the sound is coming from, a task also known as localization.

Today, this complex process can already be executed by an AI. MIT neuroscientists have now developed a computer model that can also perform that complex task. The model, which consists of several convolutional neural networks, not only performs the task as well as humans do, but also fights the same way humans do.

Where does that sound come from? An AI can already answer it

“Now we have a model that can actually localize sounds in the real world”says Josh McDermott, an associate professor of brain and cognitive sciences and a fellow at MIT’s McGovern Institute for Brain Research. “And when we treat the model as a human experimental participant and simulate this large set of experiments where people had tested humans in the past, what we find over and over again is that the model recapitulates the results that you see in the humans”, added the researcher in conversation with his university.

The new study’s findings also suggest that humans’ ability to perceive location is tailored to the specific challenges of our environment, says McDermott, who is also a member of MIT’s Center for Brains, Minds, and Machines.

To develop a more sophisticated model of location, the MIT team turned to convolutional neural networks. This type of computer modeling has been used extensively to model the human visual system, and more recently McDermott and other scientists have begun to apply it to hearing as well.

Convolutional neural networks can be designed with many different architectures, so to help them find the ones that would work best for localization, the MIT team used a supercomputer that allowed them to train and test around 1,500 different models. That search identified 10 that seemed best suited for localization, which the researchers trained and used for all of their subsequent studies.

To train the models, the researchers created a virtual world in which they can control the size of the room and the reflective properties of the room’s walls. All sounds fed to the models originated somewhere in one of these virtual rooms. The set of over 400 training sounds included human voices, animal sounds, machine sounds like car engines, and natural sounds like thunder.

The researchers also made sure that the model started with the same information provided by human ears. The outer ear, or pinna, has many folds that reflect sound, altering the frequencies that enter the ear, and these reflections vary depending on where the sound is coming from. The researchers simulated this effect by running each sound through a specialized mathematical function before it entered the computer model.

“This allows us to give the model the same kind of information that a person would have”says Andrew Francl, lead author of the study.

After training the models, the researchers tested them in a real-world environment. They placed a mannequin with microphones in its ears in a real room and played sounds from different directions, then fed those recordings into the models. The models performed much like humans when asked to locate these sounds.

The details of the findings of this research were published in Nature Human Behaviour.