Google’s Magenta artificial intelligence (AI) project produced a basic piano melody that was then given a drum machine background (apparently added by humans not an AI program) and released to the public:
Canadian media outlet The Star had four songwriters try to flesh out the melody into a full fledged song:
In this post, I’ll take a somewhat different approach in assessing this AI generated music–and perhaps surprisingly without focusing too much on the question of authorship.
The Magenta melody is presented as another step toward AI generation of creative expression. If that goal is achieved, the question is whether human expression will continue to flourish or whether it will be “outsourced” to algorithms. I don’t think anyone can really say at this point what would happen if/when the algorithms produce expression as good as, or perhaps even “better” than, what humans create. We’ll find out at such time as it might happen and such music starts entering the music ecosystem.
So for now, the practical question is whether this Magenta melody is “as good as” what humans produce. Or, more precisely, would the average person listening to this melody find it “musical” enough to satisfy at least one of the roles which music plays in their life, whether that be as foreground point of focus or background to another activity? The latter is probably the lower bar. Music that is pleasing or soothing or inspiring often serves as an almost subliminal part of many environments. And so long as the sequence of sounds is not jarring or disturbing, it may well suffice for that role. Grabbing our attention and provoking some emotive or intellectual response is a much higher bar. AI generated content may not need to meet that bar to still be deemed successful. In fact, the likely first areas in which AI music would be adopted would be these kinds of background or auxiliary roles (think elevators, waiting rooms, etc.).
I posit that the real challenge for even the low bar of background music is not in creating melodies or even whole musical passages that are coherent enough–in terms of sounding like music and avoiding dissonant sequences–but rather in creating music that flows naturally, especially in the sense of swinging or having a groove.
First, notice what most of the Canadian songwriters did first and almost without thinking: they created an underlying rhythm or groove in the key of the AI melody and then subtly but clearly rephrased the original stilted melody to flow and fit it. For copyright purposes, assuming someone at Magenta or the AI program itself were found to have copyright in the AI melody, then the question is whether the songwriters’ changes were merely stylistic arrangements–in which case the songwriters would obtain no derivative work copyright in their additions–or whether they had added further creative expression meeting the minimal Feist test of originality to qualify as a derivative work. In the cases here, primarily rhythm and harmonic accompaniment (chords) were added. These aspects of composition can face a higher hurdle for protection especially where they seem “obvious” (not a legal test) or indeed mandated by the pitch sequences and phrasing of the original melody.
In the alternative, if the melody were found to have been changed enough, the new version might simply be its own independent original work. This would be assessed without intent being dispositive. In other words, even if the songwriters intended to simply “cover” the melody, but instead had to change it enough such that the original melody was no longer substantially present, their unsuccessful cover might legally be a new work. This is simply the flip side of the songwriter who sets out to write something original, but then subconsciously copies another musical passage they had already heard elsewhere. I’ll hold off on my own assessment about whether the original AI melody is preserved substantially in each of the songwriters’ “covers,” but rather just flag the issue and leave it to the reader to listen again with this perspective in mind.
The core issue for me is whether AI–now or in the future–can create a naturally flowing passage of music that does not need to be smoothed out or grooved by human creators. It is not whether a concordant sequence of pitches can be generated by algorithm. That actually should not be that hard. Programming the code to select randomly only notes within a given key, and then sequencing them with variance (whole notes, quarter notes, eighth notes), with reference to patterns of popular melody pitch sequences, should often enough produce non-jarring melodic pitch sequences. And in fact, that’s what this Magenta melody sounds like: a fairly random key-centered sequence of tones with phrasing variety. The further sophistication here is in programming the algorithm to produce recognizable repeating sections, that allow the listener to perceive a compositional structure, rather than simply a long random sequence of tones. But it doesn’t flow, and the problem is not in the choice of technical phrasing (e.g., eighth notes following a quarter note). It is likely a fundamental issue in the digital nature of both the phrasing generation and the audio performance by the AI program.
In many ways, it reminds me of my own digital electronic music experiments at the dawn of the home computer age. I had what I believe was a Texas Instruments TI-99/4A during high school in the early 1980s
I had taught myself basic electronics in junior high through Radio Shack kits, and could design my own circuits. The next step was computer programming and I got the the TI-99/4A as a present sometime before 1983. It didn’t come with a monitor, so I connected it to the VHF antenna input on our pre-cable TV (see p. I-2 of the scanned TI-99/4A owners manual at the linked site). But because I had also taught myself to play guitar, I was thrilled by the possibilities of its “extraordinary graphics and sound” which included musical capabilities. After figure out the programming language, I managed to write code that would play a rudimentary 12 bar blues bass line. The disappointment was that it sounded like, well, a robot playing the line, with no swing or sense of groove. It wasn’t even useful for practicing, although I did try to jam along with it a few times before giving up. Simply recording myself on analog reel to reel or cassette tapes for accompaniment to practice improvised solos over as I had already been doing remained the far superior option.
The problem as I see it was and remains essentially the problem of analog vs. digital overall. By which I mean the difference between continuous and discrete approaches to time (and pitch, but that is not as relevant here). Analog electronic signals are continuously varying and so can fluidly track the continuous variation of sound waves in a medium such as air. In fact, this was one of Bell’s great breakthroughs in developing the telephone: analogous undulatory rather than make-and-break discrete (on/off) electric signals for transmitting voice over electric wire (see pp.1010-13 of this article). Digital signals, oversimplified, evenly and discretely subdivide time and pitch into exactly equal smaller and smaller units. While these can be connected in ways that approximate swinging phrasing, I still believe that the discrete approach cannot truly swing ahead or behind the beat in the way that human composers and performers can. And while digital recordings can approximately capture and reproduce a swinging human performance, it is not clear that algorithms in the near future can be programmed to generate successfully this alternately ahead and behind the beat swinging feel (or other fluid time dynamics of human generated music). And computer programming is still a resolutely digital enterprise. I know that computer music experts are well aware of this issue, and are actively trying to overcome it. But, again, the biggest surprise to me was how much the Magenta melody still has that stilted, non-flowing robot feel. By contrast, the songwriters are able to take the sequence of pitches and change the actual AI generated phrasing into approximate phrasing that flows far more naturally.
But what important role is the AI melody really playing at that point? A random music sequence generator could perhaps be helpful for a composer with writers block, to suggest some new paths or ideas. And perhaps more fully fleshed out AI music passages could begin serving in the low bar background environment music roles. However, I don’t see solely AI generated music as a sufficient replacement yet for the human generated kind. And this is without even approaching the issue of musically “saying” something as the best human music does.
As the reference to my own early digital music experiments hopefully suggests, I am no anti-computer or anti-tech retro musician. It is true that I prefer analog over digital music (even in reproduction technologies although currently I listen nearly exclusively to digital music files or streams). But I have happily employed advanced signal processing technology in my music performance and recording from the beginning. The issue for me is: can it help produce flowing musicality in a piece? If so, I’ll use it. If not, its just a fun thing to experiment with off stage.