A team of Israeli researchers won the "Best Research" award at the 21st International Society for Music Information Retrieval conference (ISMIR 2020) on Tuesday for their paper titled, "BebopNet: Deep Neural Models for Personalized Jazz Improvisation.”
Authored by M.Sc. students Nadav Bhonker and Shunit Haviv Hakimi, along with their adviser Prof. Ran El-Yaniv at the Henry and Marilyn Taub Faculty of Computer Science at the Technion-Institute of Technology, the paper indicates that it is possible to model and optimize personalized jazz preferences.
Learning to generate music is an ongoing challenge in the world of artificial intelligence (AI). An even more difficult task is the creation of musical pieces that match human-specific preferences.
In the BebopNet project, Bhonker and Haviv Hakimi, both amateur jazz musicians, focused on personalized, symbol-based, monophonic generation of harmony-constrained jazz improvisations.
To complete their objective, they introduced a pipeline consisting of three steps:
First, the researchers trained BebopNet, a music language model, to be able to generate symbolic saxophone jazz improvisations to any chord progression.
To build their initial data set, the researchers used hundreds of original jazz solos performed by saxophone giants including Charlie Parker, Stan Getz, Sonny Stitt, and Dexter Gordon.
The paper also presents a "plagiarism analysis" which compares all the featured musicians and BebopNet to evaluate the originality of the solo.
Second, the AI begins assembling a personal dataset for the user, training a personal preference metric to predict notes which reflect the user's unique personal taste.
Each user is presented with jazz improvisations which they are required to rate according to their preference, after which a regression model is used to predict the user's taste.
Finally, the model uses a process called "beam search" to optimize the note generation process to fit the user's specific taste.
“While our computer-generated solos are locally coherent and often interesting or pleasing, they lack the qualities of professional jazz solos related to general structure such as motif development and variations,” said the authors.
El-Yaniv said he hopes to overcome this challenge in future research. Preliminary models based on a smaller dataset were substantially weaker, and it is possible that a larger dataset would make a substantially better model.
However, in order to obtain such a large set, it might be necessary to abandon the symbolic approach and rely on audio recordings, which can be gathered in much larger quantities.