Saturday, October 3, 2015

one year: kotekan, format, ky file, gamakam, get duration

Over the last week or so I've been been fiddling around with kotekan calls. Actually, it's been much longer than just a week.

The fundamental problem is that they are end-weighted. This means that they have a duration leading to an end at a certain time, rather than starting at a certain time and having a duration afterwards. Concretely, this means that the overall pitch is most naturally written at the end of the event, since that's when it lines up with that pitch in the other instruments. Also, the notes of the kotekan extend from after the start time until the end time, and that the duration of the final note is indeterminate, or rather, dependent on the onset of the note after that.

This in turn ripples through the whole system. Ranges which are normally [start, end) instead become (start, end]. This goes everything from the UI selection range, to the slicing ranges that determine the slice range of a note, and hence its range on the control tracks, to lots of other internal things I can't remember now. It means that text most naturally goes above the event rather than below, that it wraps upwards rather than downwards, and even, in a circumstance I don't remember any more, events would most naturally be evaluated in reverse time order.

I originally intended to support this as much as possible and implemented these inverted events as events with negative duration, though I stopped short at reversing evaluation. However, when it came to actually writing down a score in a Balinese style, it got really awkward. As usual, I don't remember the problems in detail, but I think they were related to how at the low level, notes really do start at a time and sound for a duration. For instance, for negative events I take the controls from the end of the event rather than the start, but what if the controls change in time? Even if they don't change, to make sure a pitch extends back from several notes in the future, I'd logically need a new kind of control signal where the sample values are from start > x >= end, rather than start >= x > end. I guess the conclusion is that I could keep extending this "opposite polarity" through the system, but eventually I'll run into the fact that notes fundamentally are start + duration. And actually I'd like the "merge" the two orientations as early as possible to avoid two incompatible languages. In a way, that's the central tension of trying to integrate different kinds of music.

I concluded that fundamentally end-weighted notation is only appropriate for a higher level score, and it runs into conflicts when I try to merge it with a lower level score that has precise control curves. It's another manifestation of the time problem, where my notation represents time explicitly, and thus gets into difficulty with any kind of abstraction over time. I could theoretically have a two level score by making use of score integration to write a high level score in more abstract notation and then have it produce an editable low level score to express things like control curves, but it's a giant jump in complexity. Not just in implementing it, but in editing and in creating new score vocabulary, because it seems like it would create a whole new language and structure, incompatible with the lower level. Maybe I'll ultimately wind up doing something like that, but I'd like to avoid it until I really definitely need it.

Meanwhile, I wound up coming up with some workarounds to try to bridge the gap between end-weighted and start-weighted notation. I came up with a notion of cancellation, where you write a zero duration note at the end of a block, and it will then replace the first note of the next block via a postproc call. This gets the essential effect, but the score is still written in entirely start-weighted positive-duration notes. It also means the pitches and controls are written at the start instead of the end, but that means continuous controls work out, even if means the pitches don't line up across instruments as they should.

But the hack leads to more hacks. Kotekan calls are also end-weighted in the same way blocks are, of course. But here the replacement hack has to be extended to understand paired instruments, because polos may replace sangsih, or vice versa. Initially I gave up and simply manually added initial=t and final=f type configuration to kotekan calls, but it got annoying to have to annotate everything in this way.

This led to much refactoring of the cancel call so it could be thus configured, and what was originally just a infer-duration flag became separate weak and infer-duration flags, and then became weak, strong, and infer-duration. Now, because the final kotekan note should replace the first kotekan note, but not a normal note (neither weak nor strong), I wind up with additional initial and final flags.

Whenever I get caught in twisty rat-holes it seems like solving some simple problem is requiring an unreasonable amount of implementation complexity, and I wonder if it's really worth it. Surely I could put all that time into just writing more initial=t annotations. On the other hand, I thought the same thing about ngoret and realize-ngoret, but now I use it all the time without thinking about it. It's still complicated, but I don't have to think about that when I use it. So maybe it turns out to be worth it in the end.

Here are some other things that have happened in the last year or so:

2015-01-31: I added Util.Format4, which later became Util.Format, which is hopefully where I can stop messing with different implementations of a pretty-printing library. Which I've been doing for years. Mostly the problem is that I don't care enough about pretty printing to put concentrated time into it, but also it seems to be deceptively hard problem. The current implementation has gotten progressively more complicated since then, and still has bugs, but I just put up with them until I find motivation to go spelunking again. But perhaps I should care about pretty printing more. So much of what I do is debugging, and so much of debugging relies on having a nicely formatted display of some data.

2015-05-09: After a few false starts, I added Derive.get_score_duration, which queries a deriver for its logical score duration. The underlying problem was that I used this concept in variations on block calls, like clip and loop, which would derive the callee as-is in its own tempo. That works for block calls because you can find the block and see what its ScoreTime duration is, but it breaks as soon as you alias a block call to another name. Implementation wound up being complicated because it means I have to run a special derive that runs only far enough to figure out what logical duration the call wants to have, which wound up being yet another derive mode. But the result is that clip and loop and the like are now generic transformers that can work on anything, instead of generators special-cased to block calls.

2014-05-24: I added the "local definitions file", which was later renamed "ky file". This is actually really useful. It's just a way to define calls using the tracklang syntax, but write them in an external file, which is then loaded along with the score. It was initially just because, while I can write all that stuff inline in the score, it's really awkward with the tiny text boxes. Being able to put definitions in an external file means I use them much more, and supports the idea that calls can be comprehensive but with many arguments, with the expectation that you will specialize them locally. Eventualy ky files got an import mechanism, and now there are little shared libraries.

The big disadvantage is that scores are no longer self-contained, which means there's opportunity to break them just by moving them. More seriously, the ky file isn't versioned along with the score, so you can't undo those changes. I actually could version it by including it in the git repo along with the score, but while that might be appropriate for definitions closely tied to the score, it's not appropriate for shared libraries. This is the same problem that exists with the haskell code implementing the calls themselves, it's just in the middle of the boundary. This problem was always inherent in the "score is code" approach, it's just that as I make that approach more practical, the problems it brings also become more obvious.

The original plan was to implement this with haskell, as score-local libraries. Of course, that has even bigger problems with complexity and versioning, while having more power. So perhaps I want to draw the line at ky files: anything simple enough to do in tracklang you can put in a ky file and it's not tested and not typechecked and not versioned, but it's basically just simple macros (surely that's a sentiment that lies in the past of any number of terrible languages, grown way beyond their breaking point). Anything more complicated has to be checked in, so it gets all the nice lanugage things, but is not so appropriate for per-score use. And I don't have to mess with dynamic loading binary.

2014-06-20: Carrying on from Derive.get_score_duration, I implemented Derive.get_real_duration, which is essentially the same thing, except it gets the RealTime duration of a deriver. This is because I wanted to add sequence-rt, which is a RealTime variant of sequence. What that does is conceptually really simple: just play the given derivers one after the other in order. sequence is similar, except that it uses ScoreTime, which means that if a callee block has its own tempo track it doesn't affect the duration. In other words, a constant tempo will have no affect since the block is always stretched to fit its ScoreTime duration, and a non-constant tempo will simply make one part faster while making the other part slower. This is nice if it's what you want, but if you just want to sequence bits of score, you probably want a tempo of a given value to be the same across blocks.

This is actually the same problem I have with composed tempos in general. They provide their own kind of power, but they make it awkward to just sequence bits of score in order... and that's what sequence-rt does. It's still not necessarily convenient because, while I now automatically set event duration based on the Derive.get_score_duration, it still doesn't match up with the ruler, and I have to run LRuler.extract to figure out what the ruler should be... except that probably won't work because LRuler.extract doesn't understand sequence-rt. And in practice, I haven't used it much, but it still seems like the sort of thing I totally would use since I often just want to concetenate a bunch of blocks.

Sequencing in order is a basic operation in other score languages, and in music in general, and so it's disappointing that it's so tricky in mine. This is a result of the compositional idea of tempo, but sometimes it seems like I'm making the tradeoff in favor of the uncommon case. In practice I wind up pasting the tempo track up to the toplevel block, though a toplevel block with only a sequence-rt should mean I don't have to do that anymore.

This is also an illustration of another fundamental problem I have, which is that since the notion of time is concrete in the notation, I can't easily express things that are abstracted over time. The best I can do is make an event with the duration of the entire sequence and write out whatever it is in text, and now that I have Derive.get_score_duration I can have the event duration automatically set to whatever the text implies. This is pretty inherent in the basic design though, because the tracklang is basically just text with time, and if I get rid of the time part then I'm basically the same as any of those other music languages. And if I write some call that doesn't care about time then it doesn't combine nicely with the rest of them, which do (e.g. you can't align a signal to it).

But it seems like it should be possible to come up with some kind of compromise, e.g. perhaps where the order matters but not the duration, or to have the time aspect be set automatically and function just as a visualization. In fact, that's basically what I already have with automatically setting the event duration, so perhaps I'm already doing it.

2014-12-06: Added the *interpolate scale, which is an interpolation between two other scales. The only special thing about this is that it's a scale parameterized by other scales. Unfortunately, due to complicated reasons, I couldn't just have a val call return a scale, so I wound up having to implement a bunch of special case refactoring. It's an interesting scale though. I can either gradually change intonation, or change key, or change scales entirely, as long as they both understand the same pitch names.

2015-07-11: Added Derive.Call.India.Gamakam3, which is the third attempt at a library for gamakam. Gamakam1, the first attempt, was just a bunch of pitch calls that tried to produce the appropriate motions. This wound up being awkward because it's actually pretty inconvenient to specify exactly where these things start and end... there's that explicit time problem again. Another problem was that because some intervals are in scalar degrees and some are in absolute microtonal distances, I wound up having to either put everything on the pitch track with lots of annotations, or use multiple transposition tracks. The problem with the pitch track was that since I can't combine pitch tracks I'd have to give the basic swaram to every call, which got redundant. I also felt like I should be able to write the note swarams and gamakam separately. All of the fiddly details defeated the original goal, which was to be able to quickly get idiomatic results. So basically it was too low level, though I may be able to still use some of the calls as building blocks.

I tried to address this with Gamakam2, which was its own mini-language. I would write down the sequence of ornaments (which were mostly Gamakam1 calls), and there was a notion of start, middle, and end, where start and end were fixed duration and the middle could stretch arbitrarily. I was also hoping to be able to have gamakam calls scale gracefully as the duration of the note changed. Since the full call names were too long when all crammed into one expression, I had a separate set of aliases that would only apply within the start, middle, and end sections, so it really was like a mini-language. Since it was a note call, it knew where the note ended and I wouldn't have to manually set any kind of durations. The flip side, though, was that I'd have to express the entire note as one expression, so it wasn't very flexible. And sometimes you really do need to place a specific motion at a specific time. Also the notion of some ornaments being fixed duration didn't turn out to work very well in practice, since it seems even note to note transitions (which would go in begin or end) will expand if given more time.

So Gamakam3 is kind of a compromise between the low level Gamakam1 and high level Gamakam3. It keeps the sequence idea, but drops the begin, middle, end. There's no stretching, each call gets a slice of time depending on the time available. Some can be weighted to take more or less, and some can configure things like from pitch and transition speed, and take no duration. Many of the calls wind up being a single letter, so I can string them together as a single word. Since there's no longer a start or end, I can implement it as a pitch call, which means that I can have multiple per note, or align them as is appropriate. To retain the convenience of Gamakam3 where the note call knows where the end of the note is, I added a feature where an inverting note call sets its end time in the environ, so the gamakam call can be zero duration and still end at the note end time. To address the problem with Gamakam1 where I want to write the swarams and gamakam separately, I added a simple kind of pitch track merging where the second pitch signal just replaces the first one if the samples line up. So I can write the swarams on their own pitch track, and then the gamakam on the next track over can get the previous, current, or next swaram, and if a gamakam is present it will replace the base swaram, and if not present, I can leave the gamakam track empty.

For example, here's notation for the first four notes of Evvari Bodhana:

r   r      g         r s
    !!--2b !P1 !a--1
Ev  va     ri

!!--2b has a rmr jump at the end, and P1 !b--1 starts one step above the swaram, at m, moves down a step, waits, and then moves up a step. The notation for relative movement is 1 2 3 to move up, and a b c to move down. With such a short clip already a lot of problems are apparent. One is that the relative movement is kind of awkward. Another is that I use an extra ! to swith to compact notation where each call is one letter, but it's awkward and ugly when I need more than one letter, e.g. P1. P1 sets the "from" pitch to one step above the base swaram, but it's also quite awkward. I'd much rather not have that distinction, so maybe I should use spaces everywhere... but that's favoring higher level calls since I'd want fewer of them that express more. Also, the piece is in Abhogi, where the rmr motion is idiomatic for ga ascending, but the notation has no way to reflect that knowledge.

I actually implemented an alternate absolute notation which winds up looking more sensible:

r   r      g         r s
    !!--mr !!mg--g
Ev  va     ri

Unfortunately, this basically gets me stuck at the lowest level. I can't write movements that are independent of the pitch, and there's even no point to having the separate swaram track because the gamakam track already specifies everything. But in the long run, it really depends on how much I'm able to abstract. It could be that it's more work to trying to figure out a high level notation is slower than just writing everything directly. In fact, the whole sequencer program is sort of an exercise in finding where that balance lies.

Another problem is that the pitch merging is a hack, and if the gamakam happens to not produce a pitch at the start of the note where the swaram is, it won't replace it and you'll get a strange jump. And speaking of strange jumps, at least in vocal music you can't ever really jump pitch without a corresponding gap. I could presumably impose a slope control to reflect that even something represented as a logical jump in the notation will not necessarily sound that way.

Also I worry that the even division of times will lead to a mechanical sound. Actually, it makes it difficult to express that oscillations are definitely not regular. It also means that I have to explicitly write out each oscillation, so that the idea of gracefully adapting to the duration available is gone. It might be possible to get around that with higher level calls that take up more than one slice, or ones that warp the slice grid in certain patterns. For instance, there is a _ call that causes the previous call to take up another slice, so I can express slower transitions.

Also calls like _ wind up being magic in that they affect the parsing of the whole expression. So it winds up being an ad-hoc kind of language only with no real basic syntax rules, just whatever I can get to parse. But I guess that's always the problem with trying to be terse.

Dynamics is a whole other axis of complication. Gamakam1 didn't address that at all. Gamakam2 integrated them into the calls, so each sub-call would come with its own dynamics. For Gamakam3, I don't know what approach I want, so I wound up implementing two. One is a separate set of calls just for dynamics that is meant to go in the dyn track. It's like a simplified version of the pitch subcalls. The other one is a way to attach dynamics to pitch calls, like so [c-]< means move to the swaram pitch and hold while increasing dynamics.

Separate dynamics is more flexible and orthogonal. It also means I can easily express common things like dynamics up or down at the beginning and end of a note. But I originally added them to the pitch notation because I thought I'd want to line up dynamic changes with pitch changes, and also because it seemed like less work to just edit one thing. However, any time event text becomes even slightly complicated, it becomes a pain to edit. All the editing commands are oriented around track level editing, and as soon as you want to edit event text you're stuck in a tiny text box. If I wanted to do complicated things in text I'd use a music programming language and vim! So a text mini-language should definitely not get too elaborate.

There are also other details I'm missing but don't fully understand, such as intonation. For instance, I believe the ga in mayamalavagoula should be closer to ma, but at the moment I don't have any way to express that. I probably shouldn't try to come up with a way until I understand it better. In fact, in general I probably need to just learn more music before making another go at things. Ultimately the goal is to be able to write down melodies in the Carnatic style, which means I need to have a large set of example melodies to work from, and I need to understand how they are structured. I should also probably take a pitch tracker to some recordings to get an objective picture.

2015-07-17: Args.lookup_next_logical_pitch

Getting the next pitch wound up being quite complicated, but "what is the next pitch" has been a hard problem with no good answer for a long time. In fact, it's another one of those things that has had three separate solutions, each with their own problems. I would up needing yet another one because this one is "within a pitch track, what is the pitch of the next note of my parent note track", which is a question none of the other techniques could answer. It's also the most complicated, in that it adds a whole new derivation mode with special support in the default note deriver and in various other low level places. Basically, it runs a separate derivation in NotePitchQuery mode, and a special hack in Derive.Call.Sub will notice that and strip out everything except the pitch track. Then during inversion this is run on all the neighbor events, and the results put in Derive.Dynamic for anyone to look at. Laziness should ensure that the extra evaluation only happens if someone actually needs to know the pitch in question.

So in theory it's similar to Derive.get_score_duration and Derive.get_real_duration, except of course each of those special modes has it's own ad-hoc set of hacks scattered in the basic deriving machinery. I'm not real happy with the situation, but can't think of a better way.