This AI Can Generate Music From Seeing Videos Of Silent Piano Performances

Song

Silence is the absence of audible sound. What this means, there is nothing to be heard.

In order to imagine the sound of something, visual cues must be present. Humans can recreate sounds as long they've heard of them. And here, researchers managed to create an AI that is capable of mimicking that ability, but without the experience of hearing the sounds.

The AI called 'Audeo' can generate music, by simply watching the hands of performers in silent piano performances, to create a music out of them.

To make this possible the researchers trained and tested the AI on footage of pianist Paul Barton playing tunes by famous composers.

From the data, the team evaluated the accuracy of the AI's compositions by playing them on music-recognition apps, such as Shazam and SoundHound.

The apps identified the tune 86% of the time.

This is just 7% less than they recognized the source videos.

According to the study author Eli Shlizerman, an assistant professor at the University of Washington:

"To create music that sounds like it could be played in a musical performance was previously believed to be impossible. An algorithm needs to figure out the cues, or ‘features’ in the video frames that are related to generating music, and it needs to ‘imagine’ the sound that’s happening in between the video frames. It requires a system that is both precise and imaginative. "

The researchers had to first translate the video frames of the keyboard and the musician hand movements into raw mechanical musical symbolic representation Piano-Roll (Roll) for each video frame which represents the keys pressed at each time step.

Then, they needed to adapt the Roll to be amenable for audio synthesis by including temporal correlations.

"This step turns out to be critical for meaningful audio generation," the researchers noted.

And lastly, they implement Midi synthesizers to generate realistic music.

Audeo can then converts video to audio "smoothly" with only "a few setup constraints."

Audeo AI.

According to the study paper:

"We present a novel system that gets as an input, video frames of a musician playing the piano, and generates the music for that video. The generation of music from visual cues is a challenging problem and it is not clear whether it is an attainable goal at all. Our main aim in this work is to explore the plausibility of such a transformation and to identify cues and components able to carry the association of sounds with visual events."

Knowing the capabilities, the researchers have then explored Audeo to change the styles of music.

For example, Shlizerman said that the system could show how music produced by a piano sounds when played through a trumpet, hoping that the research can enable more ways for people to interact with music.

"One future application is that Audeo can be extended to a virtual piano with a camera recording just a person’s hands. Also, by placing a camera on top of a real piano, Audeo could potentially assist in new ways of teaching students how to play," said Shlizerman.

Published: 
08/02/2021