Thursday, January 19, 2012

Pitch and transposition

I got stuck for a long time trying to implement a feature to support diatonic transposition (very relevant for e.g. trills) and enharmonics (not too relevant in 12TET but just scales can make use of them).  After getting halfway through an implementation which didn't work out, I decided I needed to step back a bit and analyze the problem, and only start implementation again once I knew what I was doing.  Unfortunately it wasn't all that successful, but I did wind up writing a lot:

A scale is just a mapping from a symbol to a certain frequency, but it gets complicated once transposition is involved.

Integral transposition is not very hard.  (4c) transposed chromatically becomes (4c#).  Once you introduce enharmonics it becomes more complicated because it could also be (4db), so you need to know the key.  In any scale other than 12TET these are likely different frequencies, so the distinction is important.

Now once you have a key of course diatonic transposition comes to mind.  It's important to support that because many ornaments (e.g. trill) are defined diatonically.  But if you have what you need to properly support chromatic transposition then you also have what you need for diatonic transposition.

This isn't so bad, but it means that transposition must be applied to symbolic pitches.  If a note is mapped to a frequency then it's too late to transpose it, so the obvious tactic of reducing a pitch to a frequency signal and then adding a transposition signal to it, is not going to work.  It doesn't even work if the frequency signal is instead measured in scale steps, because the information about the key has been lost.

Things become more complicated when I add the requirement than pitches are not integral.  For instance, a perfectly square trill will alternate between [+0, +1] diatonic steps, but many kinds of trills are rounded, and will thus be adding a fractional number of diatonic steps.  This means that the added interval should be scaled appropriately.  Adding .5 to (4a) should add a half-step, while adding .5 to (4b) would be a quarter-step.  Adding 1.5 should hence figure out the integral transposition, and then scale the remaining fraction apporpriately.

But then of course the underlying pitch itself may not be integral.  What does it mean to add a diatonic step to a note halfway between (4a) and (4b)?  Well, the interval it adds must be scaled between the whole step of (4a) and the half step of (4b), so it would be 0.75 of a step.  The example I like to use to visualize this is a trill on a note that is sliding from (4a) to (4b) in C major.  It should gradually change from a whole step alternation to a half step alternation.

Of course, since trills are not necessarily integral, now I have to combine both a non-integral pitch and a non-integral transposition.  It's a bit head-hurting at this point, but if adding 1 to a pitch halfway between (4a) and (4b) is 0.75 of a step, then adding half of that should be 0.375 of a step.  A half-diatonic trill from (4a) to (4b) should then change from a half step alternation to a quarter step.

Naturally, when I say "adding 0.375" it's not as simple as just adding two numbers.  The actual frequency change it represents depends, once again, on the note and scale.  The only place it doesn't is, by definition, any equal tempered scale.  This, I think, is the same problem as diatonic transposition, namely converting a scale of irregularly spaced intervals to a scale of regular ones.  So I think the same process can be applied twice: In the case of diatonic transposition, transposition is applied to notes of a certain mode and reduced to chromatic transposition in a certain scale.  Then the process is repeated to reduce chromatic transposition to NoteNumbers.

The question then is how to represent pitches so this can happen, so that each scale can have its own rules for how transposition works.  A simple scale is really just a mapping from symbols to frequencies, with no separate concept of diatonic and chromatic and no enharmonics.  A complex scale has all of those and can even be retuned over time so the pitches and intervals depend on the position in the score.  In addition, pitch is a continuously variable signal, not a scalar value.  Each pitch call produces a chunk of signal which may be a single sample or a section of a curve, and they have to be pieced together into a complete curve.  There is also a distinction between absolute and relative pitches.  A relative pitch signal is independent of scale, and has zero and negative values.


So.

    Approach #1

The simplest option is a sequence of reals representing scale degrees for both absolute and relative pitches.  This breaks down pretty quickly though.  A note halfway between A (0) and B (2) would sound at A# (1).  But if you transpose that by one diatonic step you need to know the 1 is halfway between A and B and should therefore be halfway between B and C, so 2.5.  But if the 1 was just an A#, the result should be B#, sounding at 3.  The same problem exists for non equal tempered scales at the chromatic level.

This implies that I need to keep the "from" and "to" pitches around, so pitch can be a sequence of triples: (from, to, distance).  A note halfway between A and B is (A, B, .5).  Now if I want to transpose I can apply the transposition to the from and to notes, and end up with (B, C, .5).  Then reduce to a frequency by interpolating between the frequencies of B and C.  But this only works when the transposition is integral.  How would I transpose by half a step?  The transposition would be halfway between the transposition for A->B and B->C, but there's no way to represent that in a triple if 'from' and 'to' are scale degrees.  In the same way that collapsing (A, B, .5) into A# loses information important for transposition, collapsing ((A, B, .5), (B, C, .5), .5) into (A+.5, B+.5, .5) also loses information.

The key thing is that only scale degrees can be transposed, so I can't have something like A+.5.  So I should be able to apply the (from, to, distance) pattern recursively:

scale_to_frequency = [(A, 0), (B, 2), (C, 3), (D, 5)]

(A, B, .5) + 1d --> 1 + 1.5 = 2.5
    ((A, B, 1), (B, C, 1), .5) -> (2, 3, .5) -> 2.5
    ((A, B, .5), (B, C, .5), 1) -> (1, 2.5, 1) -> 2.5

(A, B, .5) + .75d --> 1.5 * .75 = 1.125, so 1 + 1.125 = 2.125
    ((A, B, .5), (B, C, .5), .75) -> (1, 2.5, .75) -> 2.125
    ((A, B, .75), (B, C, .75), .5) -> (1.5, 2.75, .5) -> 2.125

(A, C, .5) + .75d --> 1.5 + .75d --> 1.5 + .75 * 2 = 3
    ((A, B, .75), (C, D, .75), .5) -> ((0, 2, .75), (3, 5, .75), .5) -> 3
    ((A, C, .5), (B, D, .5), .75) -> ((0, 3, .5), (2, 5, .5), .75) -> 3

As reflected above, there are two ways to apply a transposition.  Either apply it to each of the branches, splitting each into a (from, to, distance) transposition, or apply it at the top, leaving the left branch the same and transposing the right branch, and expressing 'distance' as the distance between the two expressions.  Experimentally they seem to give the same answers, though I don't understand the math well enough yet to understand why this is the case.  Distributive law, I expect.

Of course, this is given a simple scale where each degree has a direct mapping to a frequency.  In practice a degree would have to be an abstract object that can be transposed chromatically and diatonically:

class Scale d where
    chromatic :: Int -> d -> d
    diatonic :: Int -> d -> d
    reduce :: d -> Frequency

Hence:

data Degree d = I d | F (Degree d) (Degree d) Double
data Signal = forall d. (Scale d) => [(RealTime, Degree d)]

A more complicated scale, say just intonation tuned to a particular base frequency, would have degrees as follows:

data JustDegree { base :: Frequency, degree :: LetterPitch }
data LetterPitch = A | B | ... -- no accidentals

-- Or if accidentals are desired, with resolution to an arbitrary number
-- of flats or sharps:
data LetterPitch = Letter AtoG Accidental
data AtoG = A | B | ...
data Accidental = Neutral | Flat Int | Sharp Int

There is also the problem that the scale type variable will have to be reified into the data so the dynamic environment can remain monomorphic.  Presumably it could be sealed inside an existential type.

This appears to work, but it seems complicated.  A variable size structure like this doesn't lend itself to an efficient representation, which is desirable since I'll have arrays of these.  I am especially likely to have arrays where only the distance fraction is changing, and it seems like I should be able to avoid redundantly transposing all the note fields when they are mostly the same.

On the other hand, if I try to preserve sharing then each scale degree is just an extra pointer, and transposition is cheap if it's just consulting a table.  However, the recursive definition and type variable of 'Degree' ensure that those must be pointers, so I won't be able to get much space savings from unpacking.  It seems like I should be able to do some hackery like splitting the numeric values into separate vectors, but it's hard to think of how right now.

    Approach #2

Another approach is to represent pitch signals entirely as code, rather than data.

A pitch signal would just be 'type AbstractPitch = Deriver Signal.Pitch', i.e.  a function from the Dynamic state to a frequency signal.  Chromatic and diatonic transposition would be represented as signals in the dynamic environment.  This has some appeal because it means that "chromatic" and "diatonic" are simply conventions.  Scales can support or not support arbitrary forms of transposition at their choosing.  Of course, I suppose #1 can do this perfectly well too, just provide a 'transpose :: Transposition -> Int -> d' method.

This effectively delays the application of the transposition to the note in the same way as the (from, to, distance) representation does.  However, it's not nested in the same way, so all diatonic transpositions and chromatic transpositions would be collapsed.  So +1c +1d +1c would be collapsed to +2d +1c.  Are the results always the same?  I suppose whether or not they are depends on the definition of diatonic and chromatic for the particular scale, but I think for sensible scales they should be.

This is appealing because it's the same solution I use for the score in general.  Trying to create a data structure to represent the score in a way which can be later manipulated in the way I want is probably impossibly complicated given the range of manipulations.  In the same way, creating a pitch signal data type that can be later manipulated by transposition in the way I want winds up being complicated but still somewhat inflexible.  Delaying evaluation and presenting the manipulations as parameters to an abstract bit of code is similar idea to data-directed row extension as espoused by OO.

It also means that there is no concept of a "scale degree" and scales need not even have a concept of degrees, but simply names that will generate a given frequency in a given context that may include transposition.  This is apropos because I already have a "scale", Ratio, that is relative to another pitch and has no notion of steps or degrees or any of the usual structure that goes along with a scale.  In this world, pitches come from something a little more abstract and general than a scale.

However, this approach brings its own complications and difficulties.

Firstly, a pitch signal is actually generated by calls that themselves take the actual notes as arguments.  So a note call produces a single frequency, while a pitch call may create a constant segment, or interpolate from the previous note, or something more complicated.  Since only notes may be transposed, the pitch (a signal of frequencies in time) must also be delayed with the notes (a single frequency) delayed inside.  So there are two types: 'Note = Derive NoteNumber', and 'Pitch = Derive Signal.Pitch'.

As mentioned above, many pitch calls use the previous value, e.g. "interpolate linearly from previous value".  However, the previous value can't be a NN.  If the interpolate signal is changing, I need to reapply the interpolation to the source note on each sample.  Effectively, I consider the interpolate a distance from prev note and next note.  So the source note has to be a note, not an NN.  It also has to be a note representing the last sample emitted.  So if the interpolator is working at the NN level, say (b)-3nn, it has to construct that as a note and pass it to the next call.  I already have this same mechanism for control calls since they do exactly the same thing, but unfortunately I apparently can't reuse that for pitch calls because the previous value is threaded as part of the CallInfo by the track evaluator.  However, in this new system the call simply returns an unevaluated Pitch, and the actual sample values are not produced until later when all the CallInfos are long gone.  So instead of 'Pitch = Derive Signal.Pitch' I add an arg to thread the previous value explicitly: 'Pitch = (RealTime, Note) -> Derive (Signal.Pitch, (RealTime, Note))'.  Of course the Note actually has to be 'Maybe Note' since the first pitch call will not have a preceding note.  A Monoid instance for Pitch can thread them together.  But each pitch call still has to explicitly construct and return its "last note", which is a burden but can be hidden inside higher level utilities.

A further complication is that since pitch calls are now responsible for emitting the final pitch signal, they are responsible for taking the transpose signals into account.  This means that even a call that wants to emit a constant pitch must combine that with the interpolate signal in scope and may not be very constant at all!  On the flip side, it means pitch calls have control over whether transposition happens or not.  Unfortunately this doesn't mean much since pitch calls are not note calls, are not part of a scale, and thus likely not to have any interesting opinions on the matter.  Of course some note calls may very well ignore transpositions, but unless the pitch calls are going to be able to inspect the scale in effect for certain bits of hardcoded magic, the pitch call doesn't know that the note doesn't want to transpose and is thus obliged to call it at every transposition point even if the answer will always be the same.  Actually, there may well be some inspection of the scale since the pitch call still needs to know which signals the note considers transposition and should hence trigger a re-evaluation of the Note.  A scale that is not prepared to admit any transposition can simply give an empty set.

To eliminate this complexity I could think about trying to merge pitch and note calls and put everything into the scale, but the division is rather essential to factoring out scale independent functionality, like simple interpolation.

Efficiency is an additional concern.  Previously, a pitch signal would be evaluated once and various events underneath it could chop out the bits they are interested in.  However, if the pitch signal is now a deriver that is sensitive to the environment, every event that wants the signal will have to evaluate the whole thing again and there is no way for the deriver to know the event is only interested in a portion.  When the pitch track is below the event track (it's usually called the note track, but that's confusing in this context) it gets chopped up into one section per note, but if an event wants to know the pitch of a *following* event, it needs the pitch track already evaluated.  This is a particularly nettlesome aspect of score evaluation in general and I hope for a better solution, but if there isn't one then the redundant evaluation of a whole signal for every event may be too inefficient.  In that case, I may be able to hack on a cache to reuse the previous signal given that the transposition signals haven't changed since the last evaluation.

And of course this hints at a general problem when moving from data to code: I lose the ability to inspect the data.  I can evaluate a Pitch and look for a particular frequency, but if I want a "scale degree" out of it then there's no way.  In fact there's no real concept of a scale degree at all so I can't even ask simple questions like "what was the pitch class of the previous note" or, even harder due to the above evaluation order problem, "what is the pitch class of the next note".  But these are perfectly normal questions that I *will* need to ask.  Of course, given note calls in their full glory there may very well be no simple answer to that question.  But for scales which *do* have well defined dergees, which is after all likely to be most all of them, I will need a function 'NoteNumber -> Expr' where Expr is the textual representation of call that will generate that NoteNumber, or the nearest scale degree.  And then I could transpose a given Expr at the score level via the roundabout route of evaluating it to a NoteNumber in the proper transposing context and then converting that NoteNumber back to an Expr.  Such is the price of generality.

But such a price!  There is additional complexity in that tracklang Vals can now have Derivers inside them, courtesy of Note, which introduces a circular dependency between TrackLang and Derive.  Much of the similarity between control and pitch signals is lost; I can no longer deal with a pitch signal as easily as a control signal.  The simple concept of a signal of numbers or scale degree numbers is replaced by abstract pitches and notes which need to be evaluated in the correct context or there's a bug.  Also, a lot of code needs to be changed.

On the other hand, some code can be removed as well.  All of the PitchSignal data structure can be dropped, which is price already paid for trying (though failing) to support scales and transposition.

Another approach is to simply say transposition is not worth all the trouble and abandon it entirely.  But just about all pitch oriented ornaments are defined in terms of relative pitches, e.g. trill so that would be abandoning most of what I want to do.  Once I've accepted that I want a trill, wanting a diatonic trill is the next obvious step, since after all they mostly are.  Then, given that supporting continuously changing pitches is also central, I need to solve a diatonic trill on a changing base note.  Following that path seems to lead me inevitably to this conclusion.

Unfortunately, there's another feature missing: I need to be able to look at the pitch of other notes.  For example, a mordent needs to create new notes based on the initial pitch of a note.  But an abstract pitch signal doesn't give access to the individual notes inside.

Even more unfortunately I got most of the way through approach 2 before figuring this out.  And it was quite a bit of work because it changes quite a lot.

    Approach #3

Based on this, I modified approach 2 to be somewhat more conservative.  A PitchSignal remains a signal, that is, a vector of (RealTime, Pitch) pairs.  A pitch is a function Controls -> NoteNumber.  It can be transposed by composing a function that modifies the controls in front of it.

I get to reuse some of the old PitchSignal stuff (which treated a pitch as a (from, to, distance) triple) since this is a signal too.  But I keep the part from approach #2 where transposition is normal control signals with conventional names.  Though I could modify the pitch signals directly as before, if I want to support various different kinds of transpositions (so far I have chromatic, diatonic, and hz) I need something flexible that I can write directly into the score, and control signal names are just that.


It's a little frustrating to have spent so much time with ultimately rejected approaches before coming up with one that worked.  The whole idea behind writing down the requirements was to think of all the possibilities before spending time implementing something.  But even so I missed some requirements until it came time to actually convert existing code.  After thinking about the problem for such a long time I do have a kind of better handle on the problem and requirements, but of course that's only after the fact.  It would have been nice to have that perspective when it actually could have been useful.

It's interesting to observe that so much of the difficulty in interpreting the music score is controlling the order of evaluation.  With pitches, the problem is that the reduction of scale degrees to their frequencies must be delayed until all transpositions have been applied.  The purpose of the Deriver after all is to delay the evaluation of score elements until the full context, including pitch, is available, so they can make decisions based on that context.

This is somewhat related to the other key theme I've noticed, which is the tension between code and data.  Code is opaque but flexible, data is transparent but rigid.  To increase one is to diminish the other.  Since my main goal is flexibility in notation, I wind up favoring code over data.  The result is seemingly unavoidable complexity on the editing side, since this is where the lack of transparency hurts the most.

No comments:

Post a Comment