Polyphonic Music Composition with LSTM Neural Networks and Reinforcement Learning

Polyphonic Music Composition with LSTM Neural Networks and Reinforcement Learning

Harish Kumar, Balaraman Ravindran,
Texas A& M University
Indian Institute of Technology, Madras
harishk1908@gmail.com, ravi@cse.iitm.ac.in
Abstract

In the domain of algorithmic music composition, machine learning-driven systems eliminate the need for carefully hand-crafting rules for composition. In particular, the capability of recurrent neural networks to learn complex temporal patterns lends itself well to the musical domain. Promising results have been observed across a number of recent attempts at music composition using deep RNNs. These approaches generally aim at first training neural networks to reproduce subsequences drawn from existing songs. Subsequently, they are used to compose music either at the audio sample-level or at the note-level. We designed a representation that divides polyphonic music into a small number of monophonic streams. This representation greatly reduces the complexity of the problem and eliminates an exponential number of probably poor compositions. On top of our LSTM neural network that learnt musical sequences in this representation, we built an RL agent that learnt to find combinations of songs whose joint dominance produced pleasant compositions. We present Amadeus, an algorithmic music composition system that composes music that consists of intricate melodies, basic chords, and even occasional contrapuntal sequences.

Polyphonic Music Composition with LSTM Neural Networks and Reinforcement Learning


Harish Kumar, Balaraman Ravindran, Texas A& M University Indian Institute of Technology, Madras harishk1908@gmail.com, ravi@cse.iitm.ac.in

1 Introduction

Computer-driven music composition has been approached through various pathways ([?]): through knowledge-based systems, evolutionary algorithms, Markov Processes, and in the last 25 years, Artificial Neural Networks(ANNs). The presence of a strong mathematical framework behind music has explicitly and implicitly aided researchers in these approaches. ANNs’ propensity to successfully learn complex patterns from raw data has made them especially suited to the problem of algorithmic music composition. Several research endeavours have been undertaken on the subject of training ANNs to compose original music that appeals to human sensibilities. These works have resulted in computer systems that can fashion musical sequences that are occasionally quite difficult to discern from human-composed music.

Despite these advances, the application of Artificial Intelligence in music composition is a nascent domain that has great potential for extensive scientific research. In this paper, our contributions to this evolving area are as follows: We have built a novel representation for polyphonic music that has a much lower sparsity when compared to other existing representations. Attempts at using reinforcement learning towards getting preferred compositions from already trained neural networks have been very few - we take an approach that is quite different from these attempts and present the results of our work where we observe significant improvements when using RL. We also present, to the best of our knowledge, the first instance of a method to intelligently explore the powerful plan space first introduced by [?].

2 Conventions and Definitions

We use the following set of conventions and definitions throughout this paper:

  • A note is equivalent to playing a pitch for a definite duration.

  • A note-set is a set of pitches that begin simultaneously, but last for independently defined durations.

  • Unless specified otherwise, the term subsequence means a contiguous subsequence.

  • Musical sequences are visually presented in the piano roll visualization111While this has its own limitations, it greatly aids in appreciating the general flow of the melody, harmony, rhythm, etc.. The vertical axis represents note pitch while the horizontal axis represents the flow of time.

3 Related Work

[?] trains a neural network to first learn sequences drawn from existing music. Input configurations called plans are used to encode the identity of sequence that the neural network is presently learning. During the composition phase, interpolated plans are used to generate new melodies. [?] use a Recurrent Neural Network along with a multidimensional pitch representation based on psychological studies from [?]. Both these approaches are examples of using an ANN to generate a melody sequentially, note-by-note.

There has also been research ([?]) into harmonizing existing melodies with [?] and [?] being some instances. [?] and [?] demonstrate a neural network harmonizing three out of the soprano, alto, tenor and bass parts of a Bach chorale when one of them is fixed.

Numerous attempts at sequential music composition with neural networks have used the Long Short-Term Memory units introduced in [?], starting with[?]. [?] in contrast, apply Gated Recurrent Units (GRU) for monophonic composition.

There are a few instances([?] and [?]) of reinforcement learning being used over trained neural networks to impose additional conditions on the composed music. Another previously used non-reinforcement strategy for imposing external conditions on a separately trained neural network is to use sampling grammars as seen in [?].

4 LSTM Neural Networks

The temporal capacity of a reccurent neural network allows it to learn patterns that are spread over time, and this has permitted RNNs to be used in signal processing, natural language processing and music generation among other areas. However, simple recurrent neural networks suffer from the vanishing and exploding gradients problems demonstrated in [?]. These problems cause the gradients corresponding to earlier inputs to either vanish or blow-up depending upon their value, and thus these networks have difficulty in learning long-range temporal dependencies. This is a particular deficiency for learning patterns in music as temporal dependencies in musical sequences may be several notes apart.

Long-Short Term Memory units ([?] and [?]) step around this problem by using a constant error carousel that trap the error within a cell. Two gating neurons regulate the flow of information into a cell and out of it respectively, while one neuron learns whether to forget the value inside the cell.

Specific details about the input, output and internal state relations followed in LSTM units can be found in [?] and [?].

5 Input Representation

5.1 Pitch and Duration Encodings

We first transpose the musical sequences to the same key, in accordance with positive results observed in [?], [?], [?], [?] and [?].

In contrast with the approaches in [?] and [?] where encodings based on musical information were used to represent pitches, chords and note durations, we chose to follow the empirically validated methodology adopted by later methods as seen in [?], [?], [?] and [?] where the network was able to learn low-level harmonic correlations between notes without the need for building them into the representation.

We represent a single pitch value by a one-hot encoded vector of length equal to the number of pitches playable on most pianos(88, from A0 to C8).

Varying approaches are taken for representing note durations. Some eschew the need for one by sampling the input tracks at a fixed rate. [?] sample by quavers(eighth notes), while [?] samples by demisemiquavers(thirty-second notes). Some earlier monophonic approaches such as [?] and [?] used compressed binary vectors to encode note duration. [?] use a 30-bit vector to explicitly encode note lengths from a semiquaver(sixteenth note) to a breve(two whole notes).

We chose to use explicit duration encoding instead of the sampling method, firstly since it simplifies the multi-stream representation that we use, and secondly since forcing the network to learn patterns in note duration distributions was expected to help later during the plan interpolation process. Further, any errors in the sequential counting task resulting from the sampling approach can collapse the rhythm of a polyphonic composition.

We encode the durations as a one-hot vector, with each bit corresponding to a note duration that is frequently observed in the musical corpus that we use.

5.2 Existing Representations for Polyphonic Music

A widely used representation for polyphonic music is the piano roll representation. Here, each bit in a binary vector represents the On/Off state of a pitch. The symbolic music is sampled at a fixed rate, and a Sustain pitch indicates whether or not the previous note is continued.

After this, the learning algorithm may either treat each note as a binary classification problem([?]), or sequentialize the notes([?]) to produce -class classification problems where is the instantaneous polyphony and is the number of possible pitches.

[?] and [?] divide pop music into a melody track and a chord track, and use separate one-hot encodings for these tracks. However, this representation is limited to the class of songs that have a separate chord track that uses a standard set of multi-note chords for harmony.

[?] use a pitch representation that has some similarities to our proposed representation. The major differences lie in our use of an explicit duration representation, the resolution of resulting incompatibilities and the imposition of pitch ordering.

5.3 The Multi-Stream Note Representation for Polyphonic Music

In the multi-stream representation which is a novelty arising from this work, we represent polyphonic music as a small set of streams, all of which are monophonic. Each stream has a pitch and duration component tied to it. This representation can embody polyphonic music of all forms and is limited in this process only by the number of streams and the set of permissible durations d.

  • During transcription, each incoming note in a polyphonic track is allotted to the lowest possible stream.

  • Multiple notes in a note-set are sorted in descending order of pitch to encourage the localization of closely spaced pitches in the same stream over time.

  • We introduce a Rest pitch value(111 here) to represent time-steps when a stream is not sounding any note.

  • We also use a Sustain222The occurrence of the Sustain pitch is deterministic and hence does not unnecessarily complicate the learning task. pitch value(110 here) to handle cases where at the start of a note-set, a particular stream is expected to continuously play a previously started note rather than strike it again.

  • Rest notes are filled into streams such that they do not have to be sustained in the next note-set.

Figure 1: The first 16 note-sets from Für Elise in the Multi-Stream Representation. Symbolic Pitches and Durations are depicted.

Fig 1 presents an example with the first few notes of Beethoven’s Für Elise in the multi-stream representation. This representation overcomes the following shortcomings of the piano roll representation and greatly reduces the sparsity of the input space:

  • The piano roll representation produces an extreme class imbalance for most pitches, since predicting each pitch is a binary classification problem, and since any given pitch is off during most samples. The lower streams in the multi-stream representation are spread very well across the various playable notes(classes), and any imbalance is by virtue of a particular note being played frequently.

  • The multi-stream representation produces a division of responsibilities among streams where the lowest stream generally learns to play the central melody, and higher notes learn the harmony or the ornamental notes.

  • Very few, if any single-instrument compositions will have more than 15-20 notes being played simultaneously. [?] note that on a large collection of polyphonic music that included hundreds of musical pieces from different genres of music, the maximum polyphony was only , with the average being . The piano roll representation however, allows for the unnecessary freedom of playing upto simultaneous notes at the cost of increasing the sparsity of the space and the probability of incompatible pitches333A crude workaround for this is to impose an upper limit on the number of simultaneously sounded notes during the composition process.

In the ordinary piano roll representation, if , , and are composition length, sampling rate, and the number of pitches respectively, then the number of possible compositions is given by

In the multi-stream representation with streams and with as the available set of note durations, a composition of time can be made up of several different combinations of durations. Through numerical simulations, we observed that on reasonable choices of d such as the semiquaver to breve set used in [?], and our frequent durations set, the total number of possible compositions of length is greatly dominated by those that fully use the shortest durations for all notes444This is since the minimum duration is much smaller than the next available duration.

The approximate number of compositions is then

The logarithmic ratio, for the representative values of , and is . In other words, the sparsity of the composition space in the piano roll representation increases times faster than the multi-stream representation for every whole note. We noted experimentally that during the composition phase, the neural network classified notes with much higher confidence with the multi-stream representation than with the piano roll representation.

5.4 Plan and Plan Interpolation

The practice of using an additional set of variables called a plan to differentiate subsequences taken from different songs has previously been used in [?], and later in [?]. Both these approaches use a one-hot encoded binary vector to provide information on which song the given subsequence is from. During the composition phase, [?] uses interpolated plans to force creativity, while [?] turns on all the bits in the plan. [?] notes that compositions inherits attributes from the songs that have high weights in the plan and offer the following paraphrased analysis of the plan inputs: The activations from the plan inputs act as biases on the hidden units, causing them to compute different functions of the context inputs. The context inputs can be thought of as points in a higher-dimensional space, and the hidden units act as planes to cut this space up into regions for different outputs.

This implies that switching a single bit in the plan input will cause significant alterations in the network’s behaviour, and this is indeed corroborated by our experiments.

We first re-validated the attribute inheritance through experiment when we observed that major parts of the composition’s pitch and note duration distributions are combinations of strongly(with very high/low relative frequency) observed attributes in the songs corresponding to the On bits.

As opposed to [?], we limited the search space from to . This is since it transforms the compositional attribute inheritance from a parametrized setting to that of a competitive, dominance-based process. Here, multiple songs from the plan will compete or collaborate to influence the composition, as per their own mismatching or matching attributes.

The qualities and advantages of the plan space have been largely unexplored until now, especially its properties when enlarged by using a large number of songs. We present the first instance of a method that can intelligently explore interesting points in this space without undertaking an exhaustive search. We also show later with a few examples that the attribute inheritance goes beyond simple properties like pitch and duration distribution.

6 LSTM Neural Network Training and Results

6.1 Training Data, Inputs and Targets

We used MIDI versions555from http://www.piano-midi.de of a selected set of solo piano compositions from Beethoven, Mozart, Lizst, Bach and Albéniz. Since these songs all have varying tempos, we quantized the tempo values to ensure that a small, common set of note durations would represent note lengths in all these tracks. We applied further temporal quantization to adjust misaligned notes caused by human performance.

The input during each prediction task is a sequence of note-sets in the multi-stream representation, where is the context length for predicting the next note-set. Given these, the neural network is expected to predict the pitch and duration values for all streams in the next note-set.

6.2 Neural Network Structure and Training

We observed that using shared LSTM layers for both the pitch and duration inputs resulted in faster training and better generalization than using separate layers for these inputs. Following the LSTM layers each with LSTM units, we added fully connected layers, all using the softmax activation function to predict the pitch and duration values for each stream. The loss function was the sum of cross-entropies for each classification problem, i.e.

We used the Adam Optimizer introduced in [?] for gradient descent, and a batch size of 64. We used the Keras software library to implement the LSTM neural network. We perform a grid search over , , and while minimizing the loss function defined above. We obtained , and as the optimal values.

6.3 Network Learning over Epochs

Fig 2 shows the clear progress in the network’s learning over training epochs.

At 4 epochs, the network has simply identified the pre-eminence of treble and bass parts in most of the training songs and emulates this. The composition is arrhythmic, with note starts and endings mismatched throughout the sample.

At 20 epochs, we observe significant improvements in the rhythm, and there are even symmetric movements between the treble and the bass components. There is also a rudimentary understanding of melodic intervals and tonality. However, there is little melodic or harmonic variation in the composition at this point.

At 50 epochs, the melodic and harmonic complexity greatly evidence themselves in the compositions. There are occasional contrapuntal sections(not seen here) where two separate melodies intertwine harmonically, and the rhythm is well-maintained.

Figure 2: Compositions from the network at 4 Epochs(left), 20 Epochs(center) and 50 epochs(right). Y-Axes are not to equal scales.

7 Composition

We follow the sequential composition process used by [?], [?], [?] and [?] among others. We compose one note-set in every step, then append this note-set to the context for predicting the next note-sets. We seed the network with an initial sequence as is the practice in previous work.

For a given contextual sequence of past note-sets, the neural network outputs probability distribution vectors for the pitch and duration values for every stream in the next note-set: and .

From these distributions, we sample one value each, thus defining the entire note-set.

To exert control over the incidence of low-probability pitches and durations in the compositions, we modify the neural network’s softmax output with the Boltzmann Distribution before sampling from it. A parameter called the Temperature is used to this end.

The probability distribution varies between a deterministic point for , itself for and the discrete uniform distribution for .

At low temperatures, the network often makes predictions that are ”safe” and have low variation. With an increase in temperature, the predictions become varied, at the risk of selecting notes that are not apt for the composition.

We use this temperature during the reinforcement learning phase as it allows us to tune melodic, harmonic and rhythmic variation with our need to reduce poorly placed notes.

8 Applying Reinforcement Learning

8.1 Applicability of Reinforcement Learning

A neural network trained to reproduce existing musical pieces when given completely unseen inputs will just output compositions that have high probabilities according to the distribution it previously learnt. While they do tend to be guided by the conditional probabilities observed in existing music, their original compositions may not possess positive attributes that we expect from them. The compositions may contain unwanted quirks that simply maximize the joint probability according to the network, for instance, a small cycle of somewhat pleasing notes that repeats indefinitely. [?] and [?] note that the music generated through a multi-step process from RNNs lack global structure and are overly repetitive.

These observations establish the strong need for having feedback and conditioning on the neural network’s compositions. [?] observe that ”the reinforcement strategy allows to combine arbitrary user given control with a style learnt by the recurrent network”.

8.2 Rewards and Penalties

We define the following rewards and penalties (denoted (+) and (-) respectively), basing upon rules used previously in [?], [?] and [?], and tailoring it to our specific requirements:

  1. (+) Occurrence of dyads, triads and seventh chords.

  2. (+) High pitch entropy.

  3. (-) Overuse of very short or long note durations.

  4. (-) Multiple identical note-sets in sequence.

  5. (-) Rests form too large a part of the played note-sets.

  6. (-) Large peaks in the normalized cross-correlations666Cross-correlation is performed with the pitch values represented in one-hot encoding. Only note-sets with overlapping pitches must contribute to the cross-correlation. between pitch values in subsequences from the composition and songs in the training set777This penalty was added to suppress compositions that plagiarize parts of the training data..

The numerical parameters and thresholds for these rewards and penalties were chosen by maximizing the aggregate reward on a set of compositions that were manually determined to be pleasant.

8.3 Problem Modelling

In a novel exercise, instead of directly altering the weights of the previously trained polyphonic reproduction network or cascading note-level sampling networks that learn through RL, we take a different approach. We use the powerful, varied, high-level control offered by the plan input and the temperature parameters to surface combinations that result in compositions that conform to our expectations.

In this case, the problem of searching through the space of plans and temperatures can be restated as a Markov Decision Process([?]):

  • Plan and temperature inputs together form a state .

  • At every state, the allowed actions888In a simplification, the state transitions upon taking actions are all deterministic. A involve either switching a particular bit in the plan, or altering the temperatures.

  • The reward is obtained by composing a song with the selected plan and temperatures and evaluating it.

We use the mathematical framework and the iterative algorithm described in [?] and [?] to learn the Q-Values. We use a three-layer neural network as a function approximator([?]) to store and predict the Q-Values. This methodology has sometimes been observed to improve learning by using previous experience towards predicting the utility of unseen states.

8.4 Implementation

We implemented the Q-Learning algorithm in Python and used the Keras software library to implement the function approximator neural network. We used Mean Squared Error(MSE) and Stochastic Gradient Descent(SGD) to train the neural network. We also performed a grid search to find the optimal hyperparameters among discount factor , learning rate and number of hidden neurons . We obtained , and as the optimal values.

8.5 Comparison of Average Compositional Quality with and without RL

The plot in Fig 3 illustrates the progress and utility of the learning process. It shows the count of good compositions999A good composition is taken as that which fetches a reward greater than out of a maximum of . in a moving window 200 iterations wide. From the strong upward movement of the average count line, we see clearly the advantage offered by reinforcement learning over a random plan selection approach.

Figure 3: Moving Windowed Count of Good Compositions against RL Iterations
Average over 200 Compositions
Attribute With RL Without RL
Dyads, Triads and Seventh Chords 0.24 0.12
Pitch Entropy 0.82 0.71
Very short/long duration incidence 0.05 0.08
Repeated Identical Note-sets 0.06 0.20
Aggregated Rest Duration 0.08 0.14
Rest Count 0.07 0.11
Cross-Correlation peak 0.13 0.2
Table 1: Improvements in compositional attributes when using Reinforcement Learning

Detailed per-attribute improvements are shown in Table 1. The averages in the table are on absolute values for the entropy and the cross-correlation peak, and relative values for the other attributes.

9 Qualitative Analysis of the Compositions

Two interesting subsequences from Amadeus’ compositions are shown in Fig 4. The sequence on the top was composed with the plan bits corresponding to Für Elise and Bach’s BWV 850 set to 1.0. It shows a clear melodic sequence with an accompanying harmony, quite reminiscent of Für Elise. The pitch distribution, however, is strongly influenced by BWV 850, with most notes drawn from the C Major scale. It is also interesting to note that both the melody and the harmony follow and maintain distinct note lengths that together maintain the rhythm.

The lower piece was composed with the plan bits corresponding to the Movement of Appassionata and the Prelude from Albéniz’s España set to 1.0. The composition inherits the contrapuntal qualities of both these songs. There are sections in the composition where the melody frequently uses intervals from España. Once again, there is clear differentiation between the contrapuntal melodies, and the rhythm is maintained almost perfectly.

Figure 4: Two example Compositions dominated by Für Elise(top) and Espana Op. 165 Prelude(bottom)

However, higher-level musical qualities such as organized repetition with variations, surprise, tension/resolution, and well-designed climactic sequences are still out of reach at present. Identifying techniques to embody these will be the focus of our future research.

10 Conclusion, Audio Tracks and Additional Material

We presented a novel multi-stream representation that takes advantage of polyphonic music’s structure to simplify the learning problem that we are solving here. We used this representation to train an LSTM Neural Network, then in a unique approach, applied Reinforcement Learning to select high-level network configurations(rather than directly modify the network weights or its outputs) that maximize a set of attributes that we expect from pleasant algorithmic music. We also demonstrated the significant utility of the previously introduced plan space towards producing controlled, but diverse compositions.

Our system demonstrates its ability to emulate some of the notable features of music composed by human composers such as harmony, melodic complexity, tonality, counterpoint, rhythm, and the proper use of treble and bass components. While some of these properties have been displayed by previous research work in this field, the compositions produced by Amadeus present all of these as a coherent whole, thus producing music that is one more step closer to human-level composition.

In the future, we hope to first add more localized rewards and introduce an attack velocity for the played notes. We also hope to undertake a comparative rating test of our compositions on human audience(e.g. [?], [?] and [?]).

Audio tracks of compositions, and additional material such as structural diagrams, datasets, etc. can be found at https://goo.gl/ogVMSq

References

  • [Bengio et al., 1994] Yoshua Bengio, Patrice Simard, and Paolo Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2):157–166, 1994.
  • [Boulanger-Lewandowski et al., 2012] Nicolas Boulanger-Lewandowski, Yoshua Bengio, and Pascal Vincent. Modeling temporal dependencies in high-dimensional sequences: application to polyphonic music generation and transcription. In Proceedings of the 29th International Coference on International Conference on Machine Learning, pages 1881–1888. Omnipress, 2012.
  • [Briot and Pachet, 2017] Jean-Pierre Briot and Francois Pachet. Music generation by deep learning-challenges and directions. 2017.
  • [Chu et al., 2016] Hang Chu, Raquel Urtasun, and Sanja Fidler. Song from pi: A musically plausible network for pop music generation. arXiv preprint arXiv:1611.03477, 2016.
  • [Colombo et al., 2016] Florian Colombo, Samuel Pavio Muscinelli, Alex Seeholzer, Johanni Brea, and Wulfram Gerstner. Algorithmic composition of melodies with deep recurrent neural networks. In Proceedings of the 1st Conference on Computer Simulation of Musical Creativity, number EPFL-CONF-221014, 2016.
  • [Crites and Barto, 1996] Robert H Crites and Andrew G Barto. Improving elevator performance using reinforcement learning. In Advances in neural information processing systems, pages 1017–1023, 1996.
  • [Eck and Schmidhuber, 2002] Douglas Eck and Juergen Schmidhuber. A first look at music composition using lstm recurrent neural networks. Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale, 103, 2002.
  • [Fernández and Vico, 2013] Jose D Fernández and Francisco Vico. Ai methods in algorithmic composition: A comprehensive survey. Journal of Artificial Intelligence Research, 48:513–582, 2013.
  • [Franklin, 2001] Judy A Franklin. Multi-phase learning for jazz improvisation and interaction. In Proceedings of the Eighth Biennial Symposium for Arts & Technology, 2001.
  • [Franklin, 2004] Judy A Franklin. Recurrent neural networks and pitch representations for music tasks. In FLAIRS Conference, pages 33–37, 2004.
  • [Gers et al., 1999] Felix A. Gers, Jürgen Schmidhuber, and Fred Cummins. Learning to forget: Continual prediction with lstm. Neural Computation, 12:2451–2471, 1999.
  • [Hadjeres et al., 2017] Gaëtan Hadjeres, François Pachet, and Frank Nielsen. Deepbach: a steerable model for bach chorales generation. In International Conference on Machine Learning, pages 1362–1371, 2017.
  • [Hochreiter and Schmidhuber, 1997] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  • [Jaques et al., 2017] Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, Jose Miguel Hernandez Lobato, Richard E Turner, and Doug Eck. Tuning recurrent neural networks with reinforcement learning. 2017.
  • [Kingma and Ba, 2015] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. International Conference for Learning Representations, 2015.
  • [Liang, 2016] Feynman Liang. Bachbot: Automatic composition in the style of bach chorales. Masters thesis, University of Cambridge, 2016.
  • [Melo, 1998] AF Melo. A connectionist model of tension in chord progressions. Master’s thesis, University of Edinburgh, 1998.
  • [Mozer and Soukup, 1991] Michael C Mozer and Todd Soukup. Connectionist music composition based on melodic and stylistic constraints. In Advances in Neural Information Processing Systems, pages 789–796, 1991.
  • [Puterman, 2014] Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
  • [Shepard, 1982] Roger N Shepard. Geometrical approximations to the structure of musical pitch. Psychological review, 89(4):305, 1982.
  • [Shibata, 1991] Naoki Shibata. Neural network-based method for chord/note scale association with melodies. 32:453–459, 07 1991.
  • [Sun et al., 2016] Zheng Sun, Jiaqi Liu, Zewang Zhang, Jingwen Chen, Zhao Huo, Ching Hua Lee, and Xiao Zhang. Composing music with grammar argumented neural networks and note-level encoding. arXiv preprint arXiv:1611.05416, 2016.
  • [Todd, 1989] Peter M Todd. A connectionist approach to algorithmic composition. Computer Music Journal, 13(4):27–43, 1989.
  • [Watkins and Dayan, 1992] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4):279–292, 1992.
  • [Watkins, 1989] Christopher John Cornish Hellaby Watkins. Learning from delayed rewards. PhD thesis, King’s College, Cambridge, 1989.
  • [Yang et al., 2017] Li-Chia Yang, Szu-Yu Chou, and Yi-Hsuan Yang. Midinet: A convolutional generative adversarial network for symbolic-domain music generation. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR’2017), Suzhou, China, 2017.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
336297
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description