Get Complete Project Material File(s) Now! »
Listening, Music generation, and Prior knowledge
In his classification system for interactive systems, Rowe (1992) proposes a combination of three dimensions: score-driven vs. performance-driven systems; transformative, generative, or sequenced response methods; and instrument vs. player paradigms. In this sub-section, we introduce some (non-orthogonal) key notions that will be used in this chapter and that are particularly relevant in our scope of guided human-computer music improvisation. They are illustrated by some related projects that will be described later on.
L I S T E N I N G When a system listens to the musical environment, it can be in order to react and / or to learn. “Listening” means here listening to a human musician co-improvising with the system, and does not concern controls that can be given to an operator-musician controlling the system. Reaction: In this case, the playing of the musician is analyzed in real time and the result of this analysis can for example trigger some generative processes (e.g. Lewis, 2000), or be mapped to a correspond-ing event which is retrieved in a corpus (e.g. Moreira et al., 2013; Pa-chet et al., 2013; Bonnasse-Gahot, 2014).
Learning : In the corpus-based approach (see below), a system lis-tens and learns by making its memory grow when the human co-improviser plays. Musical inputs can also be learnt to feed prede-fined generative models. For example, Band Out of a Box (Thom, 2001) is a computer accompanist with a fixed tempo in a “trading four” in-teraction scheme where a human improviser and a virtual partner re-peatedly call and respond in four-bar chunks. Each bar of the human improvisation is analyzed and assigned to a cluster (“playing mode”) and feeds the associated generation model. Then, the computer re-sponse is constituted by four bars belonging to the same sequence of modes using the generative models.
G E N E R AT I O N The corpus-based interactive music systems create mu-sic from a musical memory constituted by offline corpora and / or live material. Sequences in this memory are searched, retrieved, trans-formed, and concatenated to generate the machine improvisation (e.g. Assayag et al., 2006b; Surges and Dubnov, 2013; François et al., 2013; Ghedini et al., 2016). With this approach, the resulting musi-cal aesthetics strongly depends on the chosen musical memory. The system ImproteK implementing the models and architectures pre-sented in this thesis belongs to this first category.
In the rule-based case, the musical structures are synthesized by rules or autonomous processes that can follow their internal logic (e.g. Blackwell, 2007) or interact with the environment. The pioneer system Voyager (Lewis, 2000), conceived and programmed by George Lewis since 1986, is a “player” program (using the classification pro-posed by Rowe (1992)) which is provided with “its own sound”. It is designed as a virtual improvising orchestra of 64 asynchronous voices generating music with different sonic behaviors in real time. For each voice, the response to input goes from “complete commu-nion” to “utter indifference”. Voyager is defined by its author as a “kind of computer music-making embodying African-American cul-tural practice”. Its design is indeed motivated by ethnographic and cultural considerations, such as the concept of “multidominance” in-spired by Douglas (1991) who formalized the notion of “multidomi-nant elements” in musical and visual works of Africa and its diaspora.
P R I O R K N O W L E D G E O R S P E C I F I C AT I O N When an interactive mu-sic system is not purely autonomous and takes the musical environ-ment (in every sense) into account, it is not always only through lis-tening. It can also be by using some upstream specifications or prior knowledge provided by a given idiom or the musical context. This cri-terion is the most relevant in our scope of “guided” human-computer music improvisation. Next subsection lists some cases when the mu-sical context or the musical idiom provides temporal or logical prior knowledge or specification to the improviser, and how it can be used by a computer.
Prior knowledge of the musical context
In free jazz for example, historical, cultural, and social backgrounds play an important role in the way improvisation is approached and played (Lewis, 2008). In collective free improvisation, even in the ab-sence of a shared referent (Pressing, 1984), musicians who have expe-rience playing together come to share high-level knowledge which is not piece-specific but rather task-specific, i.e. an implicit mental model of what it is to improvise freely (Canonne and Aucouturier, 2015). Here, these non-formalized aspects are not addressed. In this subsection, we only focus on explicit and formalized prior knowl-edge or specification provided by the context.
Precision of the knowledge
Explicit and formalized prior knowledge or specification given by the musical context (when they exist) can be more or less specified. We focus here on the two ends of the spectrum before focusing on for-mal temporal specifications in 2.1.3.2. Planned inputs, as in perfor-mances based on a traditional score explicitly defining pitches, dura-tions and dynamics, find computer music applications, for example, in the field of score following. On the other hand, planning can just describe a set of mechanisms, a temporal logic, or a group of events. In these latter cases, computer music applications can implement reactions to unordered events.
P L A N N E D I N P U T A music performance may refer to predefined melodies, scores, audio materials or more broadly sequences of ac-tions with their own temporality. The synchronization with a musi-cian’s performance of heterogeneous electronic actions (playing an audio file, triggering of a synthesis sound, or the execution of some analysis processes, etc) is a common problem of interactive music systems. Many solutions have emerged to deal with this issue de-pending on musical purpose or available technologies, leading to the score following approach.The most elementary solution is to launch a predefined electronic sequence recorded on a fixed support (mag-netic band, classical sequencer). In this case, the musician’s perfor-mance is totally constrained by the time of the record. Score follow-ing is defined as the real-time alignment of an audio stream played by one or more musicians into a symbolic musical score (Schwarz et al., 2004; Cont, 2006). It offers the possibility to automatically syn-chronize an accompaniment (Dannenberg and Raphael, 2006), and thus can be used for the association of an electronic part to a prede-fined instrumental or in different creative ways in mixed music (Cont, 2011b), included improvised music contexts, for example when the theme of a jazz standard appears. other hand, the prior knowledge provided by the context can take the form of an agreement on a set of logical mechanisms. It is for exam-ple the case of soundpainting, the method of “live composition” us-ing physical gestures for the spontaneous creation of music invented by composer and saxophonist Thompson (2006). It is defined as a “universal live composing sign language for the performing and vi-sual arts”. To cope with this category of prior knowledge, the solution in the field of interactive music systems is a purely reactive approach, the “agreement” being a set of logical mechanisms associated to a re-active listening module. The online analysis of the playing of a mu-sician can for example trigger predefined generative processes with complex behaviors (e.g. Lewis, 2000), or focus on a particular mu-sical dimension. Among them, Sioros and Guedes (2011a,b) use a rhythmic analysis of the live inputs to steer generative models with a focus on syncopation. In the case of corpus-based systems, reac-tive listening triggers an instant response retrieving a matching ele-ment in a corpus according to predefined mappings (this category will be discussed in Section 2.2). With a more general approach, a dedicated programming language can be used to compose reactivity in the scope of a particular musical project by defining responses to complex events implying both musical events and logical conditions (e.g. Echeveste et al., 2013c).
Formal temporal specification
F O R M A L T E M P O R A L S P E C I F I C AT I O N An intermediary be-tween the most and the least temporally specified context is the formal temporal specification. When the prior knowledge on the structure of the improvisation is not as explicit as a classi-cal score, a melody, or a theme, it may consist in a sequence of formalized constraints or equivalence classes to satisfy. This is for example the case of a solo improvisation on a given chord progression or on a temporal structure as introduced in Sec-tion 1.2.1. This category will be discussed in Section 2.2: when the improvisation relies on a known temporal structure, a com-puter music system should take advantage of this knowledge to introduce anticipatory behavior (see 1.2.2) in the generation process and not follow a purely step by step process. To address this level of prior knowledge, we propose a “scenario / memory” generation model in (Part I).
“Self-Organization” and “Style Modeling” Paradigms
The last three paragraphs emphasized different approaches where the machine improvisation is guided by the environment or by a spec-ification provided by the musical context. Before focusing on this notion of declarative guidance in Section 2.2, we present here some paradigms of machine improvisation that are not guided (in the sense that we give to this word in this dissertation) but steered by internal mechanisms of self-organization, or by the internal sequential logic of a corpus. S E L F – O R G A N I Z I N G S O U N D A branch of generative audio systems (see 2.3) focus on self-organization (Blackwell and Bentley, 2002; Black-well, 2007; Miranda, 2004). They are based on emergence of coher-ent patterns at a global level out of local interactions between the elements of a system. Self-organizing behaviors lead to a decrease in entropy, while self-disorganizing behaviors lead to an increase of entropy. We invite the reader to see (Bown and Martin, 2012) for a discussion of the notion of entropy in this context, and the idea of autonomy in interactive music systems.
S T Y L E M O D E L I N G The interactive music systems focusing on sty-le modeling are steered by the internal sequential logic of the musical material they learn. They aim at generating musical improvisations that reuse existing external material. The general idea is to build a model of the playing of a musician as it records it (or of an offline cor-pus) and to use this analysis to find new routes across this musical corpus. The machine improvisation consists in a navigation within this model that both follows the original paths (i.e. replays the origi-nal sequence) and, at times, ventures into those new passages, thus jumping to new location, and thereby providing a new version of the captured material. Concatenation is often based on the Marko-vian properties of the sequence itself: improvising thus amounts to recombine existing material in a way that is both coherent with the sequential logic of this material and so that it actually provides some-thing different than a mere repetition of the original material while keeping with its statistical property. The Continuator (Pachet, 2003) introduces a paradigm of reflective interaction. It uses variable-length Markov chains following on the work on statistical style modeling initiated by Dubnov et al. (1998) and Assayag et al. (1999) to generate new continuations from an in-put stream. The stream is parsed to build a tree structure, and as new inputs arrive, the tree is traversed to find continuations of the in-put. The system is therefore able to learn and generate music in any style without prior knowledge, either in standalone mode, as contin-uations of live inputs, or as interactive improvisation back up.
The real-time improvisation system Omax (Assayag et al., 2006b,a; Lévy et al., 2012) uses the Factor Oracle (Allauzen et al., 1999; Lefeb-vre et al., 2002), a deterministic finite automaton, to achieve style modeling (Assayag and Dubnov, 2004). The musical stream is first segmented into discrete units (a new “slice” for each new onset). Then, each slice is labeled using a chosen audio feature. Finally the resulting string of symbols is analyzed to find regularities in the mu-sical material using the Factor Oracle. As we will see later on, this au-tomaton is used in different ways within numerous research project addressing music generation and improvisation, and is also involved in the guided generation model that we propose in this thesis.
Figure 2.2 shows the resulting representation of the musical inputs with two different audio features: pitch and Mel-frequency cepstral coefficients (MFCCs). The analysis it provides serves as the basis of the generative process (see Section 5.3): by navigating this structure thanks to the Suffix Link Tree (Assayag and Bloch, 2007), one is able to connect any location within the musical material of interest to any other location that has a common suffix, i.e. a common musical past (arches in Figure 2.2, corresponding to the suffix links provided by the Factor Oracle automaton). Reading this structure following non-linear paths generates a musical sequence that is both different from the original one, and coherent with its internal logic.
Table of contents :
A B S T R ACT
R É SUMÉ
ACKNOWL EDGMENTS
1 INT RODUCT ION
1.1 Scope of the Thesis
1.2 Background andMotivation
1.3 Outline of the Contributions Presented in the Thesis .
1.4 Publications
2 GUIDING HUMAN-COMPUT E R MUS IC IMP ROV I S AT ION
2.1 Using the Prior Knowledge of theMusical Context
2.2 Guiding: “Follow my steps” / “Follow that way”
2.3 Some Considerations about Software Architecture
2.4 Research Context
I “INT ENT IONS ” : COMPOS ING MUS IC GENE R AT ION P ROCE S S E S AT THE SCENA R IO L EV E L
3 SUMMA RY AND CONT R I BUT IONS
3.1 Paradigm
3.2 Algorithms
3.3 Application and implementation
4 CONFORMI T Y, ANT ICI PAT ION, AND HY BR IDI Z AT ION
4.1 “Scenario” and “Memory”
4.2 Conformity and Anticipation Regarding the Scenario, Coherence with theMemory
4.3 “Hybridization”: the Example of Jazz Improvisation
5 “SCENA R IO / MEMORY ” GENE R AT ION MODE L 49
5.1 The “Scenario /Memory” Algorithms
5.2 Continuity with the Future of the Scenario
5.3 Continuity with the Past of theMemory
5.4 Additional Information and Optimizations
6 SCENA R I I , SCENA R IOS . . . AND “ME TA -COMPOS I T ION”
6.1 FromtheConformity to an Idiomatic Structure toComposed Improvisation Sessions
6.2 Secondary Generation Parameters and Filtering
II “ANT ICI PAT IONS ” : GUIDED IMP ROV I S AT ION A S DYNAMIC CA L L S TO AN OF F L INE GENE R AT ION MODE L
7 SUMMA RY AND CONT R I BUT IONS
7.1 Paradigm
7.2 Architectures
7.3 Application and implementation
8 INT RODUCT ION
8.1 From Offline Guided Generation to Online Guided Improvisation
8.2 ImproteK: An Interactive System
9 COMB INING P L ANNING AND R E ACT I V I T Y: THE IMP ROV I S AT ION HANDL E R
9.1 GuidedMusic Improvisation and Reactivity
9.2 Improvisation Handler: Reactive Agent Embedding an OfflineModel
10 P L ANNING IMP ROV I S AT ION: THE DYNAMIC SCOR E
10.1 An Interface Between the Environment and Dynamic Music Generation Processes
10.2 Scheduling the Reactions to the Environment
10.3 Writing a Dynamic Score and Improvisation Plans
10.4 From Scheduling to Logical Planning
III “P L AY ING » WI TH THE ( SOUND OF THE ) MUS ICI ANS
11 SUMMA RY AND CONT R I BUT IONS
11.1 Beat, Synchronization, and Dynamic TimeMappings .
11.2 Application and implementation
12 R ENDE R ING, S YNCHRONI Z AT ION, AND CONT ROL S
13 ANADAPTiVE PERFORMANCE -OR I ENT ED S EQUENCE R
13.1 Live Audio Re-Injection for Guided Improvisation
13.2 Level 1 : the Voice Process
13.3 Level 2: the Adaptive Synchronisation Loop Process
13.4 Tempo Estimation: Listening to Temporal Variables .
13.5 Level 3: Control / Rendering Process
14 INT E R F ACE AND CONT ROL S : TOWA RD AN INS T RUMENT
14.1 Upstream and Downstream Controls
14.2 Network Architecture and Video Rendering
15 A COMPOS I T ION-OR I ENT ED R ENDE R E R
15.1 Composition ofMusic Generation Processes
15.2 Scheduling Strategy
15.3 Interactions with the Improvisation Handler
IV “P R ACT ICING” : L E T THE MUS IC( I ANS ) ( P L / S ) AY
16 SUMMA RY AND CONT R I BUT IONS
17 B E RNA RD LUB AT: DE S IGN OF THE F I R S T P ROTOT Y P E
17.1 Study with a Jazzman: Bernard Lubat
17.2 Recombining and Phrasing
17.3 Downstream Controls
17.4 Reduction,Multiplication and Limits
17.5 “Hybridization”
17.6 Transversal Issues
17.7 Conclusion
18 COL L A BOR AT IONS WI TH E X P E RT MUS ICI ANS
18.1 Rémi Fox
18.2 Hervé Sellin
18.3 Michelle AgnesMagalhaes
18.4 Jovino Santos Neto
18.5 LouisMazetier
18.6 Velonjoro, Kilema, and Charles Kely
18.7 “Ateliers Inatendus”
V CONCLUS ION
19 CONCLUS ION
19.1 Summary and Contributions
19.2 Perspectives
A V IDEOS R E F E R ENCED IN THE THE S I S : L INKS AND DE – SCR I P T IONS
A.1 Performances andWork Sessions Using ImproteK
A.2 ExtraMaterial: Demos, EarlyWorks, and Experiments
A.3 Bernard Lubat: Design of the First Prototype
A.4 Some Listening Sessions and Interview withMusicians
A.5 Archives: Other Collaborations
B IMP L EMENTAT ION 201
B.1 A Library for Guided Generation ofMusical Sequences .
B.2 Reactive Improvisation Handler
B.3 Dynamic Performance-Oriented Sequencer
C INT E RV I EWS WI TH HE RV É S E L L IN
C.1 Transcriptions of Interviews and Listening Sessions .
C.2 “Three Ladies” Project: Statement of Intent and Improvisation
BI BL IOGR A PHY