This is Shimi. Shimi sings opera.
To develop Shimi’s behaviors, a deep neural network ingested the following: 10,000 files from 15 improvisational musicians playing responses to different emotional queues; 300,000 samples of musical instruments playing different musical notes, to add musical expressivity to the spoken word; one of the rarest languages in existence—a nearly extinct Australian aboriginal vernacular made up of 28 phonemes.
But what is the goal of it all?
In a world where we are surrounded by robots who need to communicate their state of mind, mood and emotion, prosody, and gestures seem like a great subtle back channel to do so, since humans cannot really process multiple linguistic channels effectively. So our long term goal for this research would be to scale our system to large groups of robots.