work on karya

Wednesday, July 8, 2020

another 4 years

It's been so long I'm not even sure what the major changes are since last time. A no-longer-so-new job has kept me much more busy for the last 2-3 years so things have gotten a lot slower.

Looking at the highlights from last time:

Im, aka 音 aka the synth backend has gotten fairly usable, though it hasn't been proven to really scale yet. The main backends are still the sampler and a faust backend.

Solkattu is pretty mature and I use it for all my lessons. Of course there are still bugs and features I could add, though I've pretty much just added new scores for the last year or so maybe it could be considered "done enough." I've even retreated on some features, I added a built-in audio realization, but I actually hardly ever use it, and it adds so many dependencies to the ghci session that I don't want it imported by default.

There is a new text-only score, which is actually its own language with its own syntax. I did a few pieces with it and I think it has some promise, but it's hard to say where its place is. I got tangled up detecting renames and moves when integrating back into karya and left it there.

Progress has been especially slow over the last 3 months after lockdown due to being even more busy with work and probably a bit of exhaustion. But over the last few weeks I started getting some things done again, so here's the latest:

I got sidetracked looking into GUI scrolling efficiency and wound up experimenting with the `druid` rust toolkit. The conclusion was that it's too young, and probably has the same performance problem, so I did another approach where I cache the whole window in an image and that makes scrolling quick but due to being "eagerly evaluated" makes zooming slow. I'll probably have to go to some tile-oriented, but it's a whole new level of complicated. Drawing 2d GUI with text is still an unsolved problem. Anyway, that was a bunch of rust and C++.

I decided I'd remake a piece I did back in high-school. That piece had a sampled breakbeat section, so I decided to start with notation support for breakbeats, which means coming up with sample start offsets for interesting parts of the sample and a scheme to semi-automatically name those offsets, based on measure position. Then I can interpolate to make a general notation for addressing an offset by either position or name (e.g. `sn1-2` for snare falling on measure 1 beat 2, or `n 1.2` for the same... in an abuse of decimal notation, the most confusing part of which is that measure.beat is 1-based, not 0-based). Then a fair amount of faffing about to figure out how to map those names to keyboard such that "octaves" correspond to an integral number of measures, where how many measures can fit depends on beats per measure / time step increment.

I somehow managed to find the CD I had sampled from, um, 25 years or so ago, and the exact break showed up on the Amazon track preview. Last time it was plugging the CD player into some outboard ADC on the Amiga for some nice 8 bit 8khz samples, this time it's download the whole track in 1s, open it in a DAW instantly, and just clip out the right bit.

Now of course since I have measure position I can estimate BPM with some of my stone-age statistics (discard quartile 1 and 4, take the mean of 2 and 3... I have no idea if this is a real thing, but the idea was to not get too thrown off by outliers), and can adjust BPM by adjusting the resample ratio. That in turn rekindled a desire to bind to the `rubberband` library to get time stretch and pitch shifting, and let's use `c2hs` this time instead of `hsc2hs`. Fortunately I had already added support for the `libsamplerate` binding, so it just meant rereading its minimal documentation with some reference to its source code and various blog posts. `rubberband` has a pretty straightforward C API and it was really easy to bind compared to the ordeal that was libsamplerate, but it helped quite a bit that I used the offline non-streaming version. It might be possible to save and restore `rubberband` state by just serializing its struct, but I just don't have the energy for that so let's live with loading and converting the whole sample at once until it proves to a problem. That let me hook up the breakbeat notation with BPM control via either pitch or time stretching, and I discovered that even with "percussive" settings, `rubberband` is not able to do this well. Attacks tend to get severly mangled. Ok, then never mind about rubberband for now. Maybe it'll be good for special effects.

"Special effects" made me think of the "pitch via comb filter" thing I did for pretty much the first "real" piece I was able to do with karya. At the time, I had to hook up Reaktor to the output of a MIDI VST, and fiddle with MIDI routing (which is a simple task that Reaper somehow makes ridiculously complicated... and still had weird restrictions and broken things with MIDI routing that I forget now). But now that I have my own sampler and FAUST I figured it was time to put them together so I can augment samples with effects. I had been intending from the beginning to merge the `faust-im` and `sampler-im` synthesizers, and failed each time due to their fundamentally different attitudes towards overlapping notes. Now it's looking like I'll incorporate faust into the sampler for effects procesing, but they'll otherwise both remain independent.

Which I managed to do, with relatively little hassle, since I already had all the pieces in place for compiling and binding to faust processors, and saving and restoring state. "Relatively little" hassle is still quite a bit of hassle... dealing with sample streams and note hashes and processor state is still one of trickiest parts to deal with, hard to think about, hard to debug, hard to write tests for.

I initially intended to add effects per-sample, so each sample could have independent effects in the same way as rubberband, but it seemed to be easier and more generally useful to have one effect per instrument. There's actually a place for both, but so far only per-instrument is implemented. Other than that, I don't need any routing. Each instrument has 0 or 1 effect, because if I want multiple effects I can combine them in faust. Of course we can't precompile every permutation, so that could make adding an effect into an awkward wait for recompile, but that's a general problem with karya. Faust does have an llvm backend and it should be possible to do a quick recompile and reload the processor dynamically.

Messing with the breakbeat notation and BPM tuning I thought I should try it out on a trikalam. This is a simple Carnatic form where you play the same material in three speeds, usually chatusram, tisram, and then melkalam chatusram. I've never heard it done for a breakbeat before, but why not? Just to use all the new things, I also applied a tuned comb filter to it to give it a sort of melody and emphasize the rhythmic contour. The result is pretty successful at being weird: https://drive.google.com/file/d/1PA_Dub6NJ-zXDqfx7ZmIRR5RX1YjeTVv/view?usp=sharing

So in the end I got about 2 weeks for implementation for the sake of one part of one section of one piece... that's what distraction looks like! I don't feel too bad about it though, because faust effects are generally useful and I had been planning to add it for a long time.

I'll probably use breakbeats too, for some reason I like them, even though I've never really listened to music that uses them. I probably should!

Tuesday, December 13, 2016

another year

Here are some major changes over the last year, in roughly chronological order:

macros

These are basically like function calls, and I actually wound up with two kinds. Maybe overkill?

Now that I have ky files, I started to want to be able to define new calls out of compositions of other ones. These macros are defined textually in the ky language, and are basically just replaced with their values. The replacement is post-parsing, but call names are still looked up dynamically, which is important because it means, for instance, a call to m will look up the instrument specific mute technique. This is kind of like late binding, so these are called dynamic macros.

Then, calls started to require "realize" postproc calls, and require them in a specific order, e.g. cancel notes, then infer pitches, then randomize starts. That made me want to define a call as a composition of other calls, in Haskell. The other calls might be in a different module, and I didn't want to rely on what happens to be in scope, and in contrast to dynamic macros, don't want to be vulnerable to rebinding. This is like static binding, so these are static macros.

gamakam4

Every once in a while I would write a library for gamakam, and then decide it was too clumsy. The 4th incarnation seems to more or less work, though it's still missing a lot. It's actually just a simplification of gamakam3, from last year.

negative duration

I once again reworked how negative durations worked. I figured out that even Balinese notation seems to work better if only certain calls use negative duration. For instance, norot has a bit of preparation, but is mostly sustaining a pitch in the usual positive manner.

But then it's quite common to have a negative duration event aligning with a positive one, and since I represented negative durations as exactly that, I'd wind up with two notes with the same start, which violates the track invariant about no overlapping notes... not to mention it would be a tricky UI question of which one you wanted to edit.

So I decided to change the representation to actually all be positive durations, but negative oriented notes would have a flag, and the UI would interpret that flag as drawing text at the bottom. For some reason I was also under the impression that this would get rid of all of the complicated special casing to support negative durations, since there wouldn't actually be negative durations anymore.

It turned out I was totally wrong about that last part, because while I could indeed do that, it turns out that for the purposes of editing I want to treat negative duration notes in the complementary way (e.g. set duration moves the start, or selections include the end but not the start), so I not only had to keep all the complicated stuff, but I had to reimplement it for the new implementation, and it kind of got even more complicated because now it sort of acts like the note is at the end of the duration, but it's actually just a normal note with a flag. Also it's still not worked out, because it's quite confusing to edit them, since you edit at the top but the text appears at the bottom.

So I don't know. Maybe I just need to polish the rough edges, and then get used to the quirks. Or maybe go back to real negative durations, but come up with some other way to have two events that "start" at the same time.

The bright part is that between this and note cancellation I seem to finally have a more or less usable way to express typical moving kotekan patterns, even if the entry and editing is rough. It just seems way more involved than it should be.

text display refactor

Since the above negative duration stuff meant I started actually writing scores that used negative duration, it annoyed me that text wrapping didn't work for them. Also, I thought it would be worth putting some time into fancy text layout so I could squeeze in the right-aligned merged track text wherever there was room, so I rewrote the text wrapping algorithm. It got really annoyingly complicated, especially with positive and negative events, and it still has some minor bugs I haven't bothered to go figure out yet, but at least negative events wrap upwards now.

It turns out squeezing the right-aligned text into gaps caused by wrapped left aligned text just doesn't happen that often, though.

im, 音

I sampled my reyong, and the simple job of supporting multiple articulations (open, damped, cek, etc.) per pitch turned out to be ridiculously complicated. This is due to terrible MIDI, terrible buggy Kontakt, and its terrible excuse for a scripting language, KSP.

I always intended to add a non-realtime non-MIDI synthesizer, and since surely a sampler is the easiest to implement, I started on an implementation. I got as far as a basic proof of concept, namely a note protocol, an offline renderer, and a simple sample playback VST to handle the synchronization. It's crazy how much simpler things become when I don't have to deal with MIDI.

However, losing realtime response is likely to be quite annoying, though I have a plan for how to get it back in a limited way. And since I already did all the work to get the reyong samples working in Kontakt, I don't have a lot of motivation yet to finish my own implementation. It will likely have to wait until I either have another annoying sampling job, or more likely, finally get around to writing a physical modelling synthesizer.

instrument generalization

Now that I technically have a non-MIDI non-lilypond backend, I needed to generalize instruments into common and backend-specific parts.

Also there was a lengthy and messy transition from the old way where you'd directly name an instrument's full name in the score, to having short aliases to the full instrument name, to aliases becoming separate allocations for the instrument (so you could use the same instrument twice), to aliases becoming the only way to refer to an instrument and renaming them to "allocations."

In retrospect, I should have done it that way from the beginning.

retuning

Of course I already have arbitrary tuning via pitch bend, but it's a hassle to add a bunch of VSTs, so I added support for retuning via the MIDI realtime tuning "standard" (supported only by pianoteq), and retuning via KSP (naturally supported only by Kontakt).

hspp

GHC 7.10 finally has call stacks, which let me get rid of the hacky preprocessor. It served its purpose, but it's much nicer to not need it. I lost calling function names because the 7.10 support for that is kind of broken, but I think it may be fixed in GHC 8.

solkattu

The track format isn't great for expressing rhythms, since the rhythm is implicit in the physical location of the note. Also, except for integration, which is complicated, it's basically "first order" in that it doesn't easily support a score that yields a score. For instance, the mapping between solkattu and mridangam strokes is abstract and dependent on the korvai, and of course could map to any number of instruments.

So I came up with a haskell-embedded DSL which is purely textual and thus can express rhythmic abstraction. It's mostly useful for writing down lessons, but it can be easily reduced to track notation, so in theory I could write solkattu and integrate that to a mridangam track. I also have some experimental alternate realizations to Balinese kendang and ideas for a reyong "backend". That way I could write in solkattu, and then have it directly realized to any number of instruments, either to play simultaneously or exchange material.

It looks like this:

t4s :: [Korvai]
t4s = korvais (adi 6) mridangam $ map (purvangam.)
    [ spread 3 tdgnt . spread 2 tdgnt . tri_ __ tdgnt
    , spread 3 tdgnt . tri (ta.__.din.__.gin.__.na.__.thom)
    , tri_ (dheem.__3) (ta.din.__.ta.__.din.__.p5)
    , tri_ (dheem.__3) (p5.ta.__.din.__.ta.din.__)
    , p123 p6 (dheem.__3)

    , p123 p5 (tat.__3.din.__3)
    , let dinga s = din!s . __ . ga
        in p5.dinga u . ta.ka . p5.p5. dinga i . ta.ka.ti.ku . p5.p5.p5
    , tri (tat.dinga . tat.__.dinga.p5)
    ]
    where
    tdgnt = ta.din.gin.na.thom
    p123 p sep = trin sep p (p.p) (p.p.p)
    purvangam = tri (ta_katakita . din.__6)
    mridangam = make_mridangam $
        [ (ta.din.gin.na.thom, [k, t, k, n, o])
        , (ta.din, [k, od])
        , (dheem, [u])
        , (din, [od])
        , (tat, [k])
        , (ta.ka.ti.ku, [k, p, n, p])
        , (ta.ka, [k, p])
        , (dinga, [od, p])
        ] ++ m_ta_katakita

And reduces to a mridangam realization like this:

k _ p k t k t k k o o k D _ _ _ _ _ k _ p k t k
t k k o o k D _ _ _ _ _ k _ p k t k t k k o o k

D _ _ _ _ _ k _ _ t _ _ k _ _ n _ _ o _ _ k _ t
_ k _ n _ o _ k t k n o _ k t k n o _ k t k n o

space leak

I noticed that after editing a score for 10 minutes or so, the UI would start getting laggy. Usually that means a memory leak and too much GC, and sure enough ekg showed that after each derivation memory usage would jump up, and never go back down again.

The first thing I blame is the cache, because it's the only thing that remains after each derivation, besides the note data itself. If the cache itself somehow holds a reference to the previous cache, then no derivation will ever be freed. It's like the joke where all I wanted was the banana, but I got the banana, the monkey holding the banana, and the jungle the monkey lives in. Only in this case I also get all the previous generations of monkeys.

I tried to debug by stripping out various fields in the cache, and got mysterious results. Dropping the cache entirely would fix the leak, but replacing all of its entries with Invalid tokens would add to it. Then I discovered that Data.Map's fmap is always lazy (of course) and so the test itself was insufficiently strict, and Data.Map.Strict.map lead to more consistent results.

I discovered an intentionally lazy field, with a potentially complicated thunk lurking inside. That's the root cause. In this case, the mechanism to get neighbor note pitches sticks the evaluation in a lazy field, with the idea that if you don't need a neighbor pitch (the common case), then you don't have to pay for the evaluation. I don't need that field once the computation is done, so I stripped it out on return. This still didn't solve the problem, because it was going into another intentionally lazy field, so of course the stripping didn't happen. I bang-patterned the value before putting it in the record, and the leak was gone!

As an aside, I discovered that I don't even need to use the value, e.g.: make x = Record (f x) where !unused = f x is already enough. Of course that's perfectly normal in a strict language, but in haskell I'm used to freely deleting unused bindings.

So the leak was gone, but now the UI had a hitch. The reason the field was intentionally lazy was to avoid doing that work in the event loop, so it could be passed to another thread and forced over there. So removed the bang and now the hitch is gone, but the leak is back! But shouldn't the other thread forcing have cleaned up the thunk in the first place? Then I discovered another fun bug:

force_performance perf = perf_logs perf `deepseq` perf_events perf
    `deepseq` perf_warps perf `deepseq` perf_track_dynamic
    `deepseq` ()

Not too obvious, right? It turns out perf_track_dynamic is exactly the field I needed to force, and yes functions are in NFData, so no type error for that.

So I fixed that and... still the leak. Actually, it seems like the leak is gone in the application, but still there in the test. I did all sorts of messing about trying really ensure that field is forced in the test and no luck.

Finally I somewhat accidentally fixed it, by refactoring force_performance to use less error-prone pattern matching instead of accessor functions, and added another field to the deepseq chain while I was at it. It turns out that other field, which has nothing to do with the guilty perf_track_dynamic one, still somehow had a pointer to it in its thunk. Since it's built in the same function that builds the other record, maybe it has a pointer to that whole function, and hence everything that function mentions. And of course one of the core principles of hunting space leaks is that you have to kill them all at once. It's like a hydra, where you have to cut off all the heads at once to have any effect.

The morale is be really careful about intentionally lazy fields. Of course that includes anything wrapped in any standard type like Maybe or (,)... so put in regression tests for both memory usage growing too much (too lazy) and functions taking longer than expected on large input (too strict).

Saturday, October 3, 2015

one year: kotekan, format, ky file, gamakam, get duration

Over the last week or so I've been been fiddling around with kotekan calls. Actually, it's been much longer than just a week.

The fundamental problem is that they are end-weighted. This means that they have a duration leading to an end at a certain time, rather than starting at a certain time and having a duration afterwards. Concretely, this means that the overall pitch is most naturally written at the end of the event, since that's when it lines up with that pitch in the other instruments. Also, the notes of the kotekan extend from after the start time until the end time, and that the duration of the final note is indeterminate, or rather, dependent on the onset of the note after that.

This in turn ripples through the whole system. Ranges which are normally [start, end) instead become (start, end]. This goes everything from the UI selection range, to the slicing ranges that determine the slice range of a note, and hence its range on the control tracks, to lots of other internal things I can't remember now. It means that text most naturally goes above the event rather than below, that it wraps upwards rather than downwards, and even, in a circumstance I don't remember any more, events would most naturally be evaluated in reverse time order.

I originally intended to support this as much as possible and implemented these inverted events as events with negative duration, though I stopped short at reversing evaluation. However, when it came to actually writing down a score in a Balinese style, it got really awkward. As usual, I don't remember the problems in detail, but I think they were related to how at the low level, notes really do start at a time and sound for a duration. For instance, for negative events I take the controls from the end of the event rather than the start, but what if the controls change in time? Even if they don't change, to make sure a pitch extends back from several notes in the future, I'd logically need a new kind of control signal where the sample values are from start > x >= end, rather than start >= x > end. I guess the conclusion is that I could keep extending this "opposite polarity" through the system, but eventually I'll run into the fact that notes fundamentally are start + duration. And actually I'd like the "merge" the two orientations as early as possible to avoid two incompatible languages. In a way, that's the central tension of trying to integrate different kinds of music.

I concluded that fundamentally end-weighted notation is only appropriate for a higher level score, and it runs into conflicts when I try to merge it with a lower level score that has precise control curves. It's another manifestation of the time problem, where my notation represents time explicitly, and thus gets into difficulty with any kind of abstraction over time. I could theoretically have a two level score by making use of score integration to write a high level score in more abstract notation and then have it produce an editable low level score to express things like control curves, but it's a giant jump in complexity. Not just in implementing it, but in editing and in creating new score vocabulary, because it seems like it would create a whole new language and structure, incompatible with the lower level. Maybe I'll ultimately wind up doing something like that, but I'd like to avoid it until I really definitely need it.

Meanwhile, I wound up coming up with some workarounds to try to bridge the gap between end-weighted and start-weighted notation. I came up with a notion of cancellation, where you write a zero duration note at the end of a block, and it will then replace the first note of the next block via a postproc call. This gets the essential effect, but the score is still written in entirely start-weighted positive-duration notes. It also means the pitches and controls are written at the start instead of the end, but that means continuous controls work out, even if means the pitches don't line up across instruments as they should.

But the hack leads to more hacks. Kotekan calls are also end-weighted in the same way blocks are, of course. But here the replacement hack has to be extended to understand paired instruments, because polos may replace sangsih, or vice versa. Initially I gave up and simply manually added initial=t and final=f type configuration to kotekan calls, but it got annoying to have to annotate everything in this way.

This led to much refactoring of the cancel call so it could be thus configured, and what was originally just a infer-duration flag became separate weak and infer-duration flags, and then became weak, strong, and infer-duration. Now, because the final kotekan note should replace the first kotekan note, but not a normal note (neither weak nor strong), I wind up with additional initial and final flags.

Whenever I get caught in twisty rat-holes it seems like solving some simple problem is requiring an unreasonable amount of implementation complexity, and I wonder if it's really worth it. Surely I could put all that time into just writing more initial=t annotations. On the other hand, I thought the same thing about ngoret and realize-ngoret, but now I use it all the time without thinking about it. It's still complicated, but I don't have to think about that when I use it. So maybe it turns out to be worth it in the end.

Here are some other things that have happened in the last year or so:

2015-01-31: I added Util.Format4, which later became Util.Format, which is hopefully where I can stop messing with different implementations of a pretty-printing library. Which I've been doing for years. Mostly the problem is that I don't care enough about pretty printing to put concentrated time into it, but also it seems to be deceptively hard problem. The current implementation has gotten progressively more complicated since then, and still has bugs, but I just put up with them until I find motivation to go spelunking again. But perhaps I should care about pretty printing more. So much of what I do is debugging, and so much of debugging relies on having a nicely formatted display of some data.

2015-05-09: After a few false starts, I added Derive.get_score_duration, which queries a deriver for its logical score duration. The underlying problem was that I used this concept in variations on block calls, like clip and loop, which would derive the callee as-is in its own tempo. That works for block calls because you can find the block and see what its ScoreTime duration is, but it breaks as soon as you alias a block call to another name. Implementation wound up being complicated because it means I have to run a special derive that runs only far enough to figure out what logical duration the call wants to have, which wound up being yet another derive mode. But the result is that clip and loop and the like are now generic transformers that can work on anything, instead of generators special-cased to block calls.

2014-05-24: I added the "local definitions file", which was later renamed "ky file". This is actually really useful. It's just a way to define calls using the tracklang syntax, but write them in an external file, which is then loaded along with the score. It was initially just because, while I can write all that stuff inline in the score, it's really awkward with the tiny text boxes. Being able to put definitions in an external file means I use them much more, and supports the idea that calls can be comprehensive but with many arguments, with the expectation that you will specialize them locally. Eventualy ky files got an import mechanism, and now there are little shared libraries.

The big disadvantage is that scores are no longer self-contained, which means there's opportunity to break them just by moving them. More seriously, the ky file isn't versioned along with the score, so you can't undo those changes. I actually could version it by including it in the git repo along with the score, but while that might be appropriate for definitions closely tied to the score, it's not appropriate for shared libraries. This is the same problem that exists with the haskell code implementing the calls themselves, it's just in the middle of the boundary. This problem was always inherent in the "score is code" approach, it's just that as I make that approach more practical, the problems it brings also become more obvious.

The original plan was to implement this with haskell, as score-local libraries. Of course, that has even bigger problems with complexity and versioning, while having more power. So perhaps I want to draw the line at ky files: anything simple enough to do in tracklang you can put in a ky file and it's not tested and not typechecked and not versioned, but it's basically just simple macros (surely that's a sentiment that lies in the past of any number of terrible languages, grown way beyond their breaking point). Anything more complicated has to be checked in, so it gets all the nice lanugage things, but is not so appropriate for per-score use. And I don't have to mess with dynamic loading binary.

2014-06-20: Carrying on from Derive.get_score_duration, I implemented Derive.get_real_duration, which is essentially the same thing, except it gets the RealTime duration of a deriver. This is because I wanted to add sequence-rt, which is a RealTime variant of sequence. What that does is conceptually really simple: just play the given derivers one after the other in order. sequence is similar, except that it uses ScoreTime, which means that if a callee block has its own tempo track it doesn't affect the duration. In other words, a constant tempo will have no affect since the block is always stretched to fit its ScoreTime duration, and a non-constant tempo will simply make one part faster while making the other part slower. This is nice if it's what you want, but if you just want to sequence bits of score, you probably want a tempo of a given value to be the same across blocks.

This is actually the same problem I have with composed tempos in general. They provide their own kind of power, but they make it awkward to just sequence bits of score in order... and that's what sequence-rt does. It's still not necessarily convenient because, while I now automatically set event duration based on the Derive.get_score_duration, it still doesn't match up with the ruler, and I have to run LRuler.extract to figure out what the ruler should be... except that probably won't work because LRuler.extract doesn't understand sequence-rt. And in practice, I haven't used it much, but it still seems like the sort of thing I totally would use since I often just want to concetenate a bunch of blocks.

Sequencing in order is a basic operation in other score languages, and in music in general, and so it's disappointing that it's so tricky in mine. This is a result of the compositional idea of tempo, but sometimes it seems like I'm making the tradeoff in favor of the uncommon case. In practice I wind up pasting the tempo track up to the toplevel block, though a toplevel block with only a sequence-rt should mean I don't have to do that anymore.

This is also an illustration of another fundamental problem I have, which is that since the notion of time is concrete in the notation, I can't easily express things that are abstracted over time. The best I can do is make an event with the duration of the entire sequence and write out whatever it is in text, and now that I have Derive.get_score_duration I can have the event duration automatically set to whatever the text implies. This is pretty inherent in the basic design though, because the tracklang is basically just text with time, and if I get rid of the time part then I'm basically the same as any of those other music languages. And if I write some call that doesn't care about time then it doesn't combine nicely with the rest of them, which do (e.g. you can't align a signal to it).

But it seems like it should be possible to come up with some kind of compromise, e.g. perhaps where the order matters but not the duration, or to have the time aspect be set automatically and function just as a visualization. In fact, that's basically what I already have with automatically setting the event duration, so perhaps I'm already doing it.

2014-12-06: Added the *interpolate scale, which is an interpolation between two other scales. The only special thing about this is that it's a scale parameterized by other scales. Unfortunately, due to complicated reasons, I couldn't just have a val call return a scale, so I wound up having to implement a bunch of special case refactoring. It's an interesting scale though. I can either gradually change intonation, or change key, or change scales entirely, as long as they both understand the same pitch names.

2015-07-11: Added Derive.Call.India.Gamakam3, which is the third attempt at a library for gamakam. Gamakam1, the first attempt, was just a bunch of pitch calls that tried to produce the appropriate motions. This wound up being awkward because it's actually pretty inconvenient to specify exactly where these things start and end... there's that explicit time problem again. Another problem was that because some intervals are in scalar degrees and some are in absolute microtonal distances, I wound up having to either put everything on the pitch track with lots of annotations, or use multiple transposition tracks. The problem with the pitch track was that since I can't combine pitch tracks I'd have to give the basic swaram to every call, which got redundant. I also felt like I should be able to write the note swarams and gamakam separately. All of the fiddly details defeated the original goal, which was to be able to quickly get idiomatic results. So basically it was too low level, though I may be able to still use some of the calls as building blocks.

I tried to address this with Gamakam2, which was its own mini-language. I would write down the sequence of ornaments (which were mostly Gamakam1 calls), and there was a notion of start, middle, and end, where start and end were fixed duration and the middle could stretch arbitrarily. I was also hoping to be able to have gamakam calls scale gracefully as the duration of the note changed. Since the full call names were too long when all crammed into one expression, I had a separate set of aliases that would only apply within the start, middle, and end sections, so it really was like a mini-language. Since it was a note call, it knew where the note ended and I wouldn't have to manually set any kind of durations. The flip side, though, was that I'd have to express the entire note as one expression, so it wasn't very flexible. And sometimes you really do need to place a specific motion at a specific time. Also the notion of some ornaments being fixed duration didn't turn out to work very well in practice, since it seems even note to note transitions (which would go in begin or end) will expand if given more time.

So Gamakam3 is kind of a compromise between the low level Gamakam1 and high level Gamakam3. It keeps the sequence idea, but drops the begin, middle, end. There's no stretching, each call gets a slice of time depending on the time available. Some can be weighted to take more or less, and some can configure things like from pitch and transition speed, and take no duration. Many of the calls wind up being a single letter, so I can string them together as a single word. Since there's no longer a start or end, I can implement it as a pitch call, which means that I can have multiple per note, or align them as is appropriate. To retain the convenience of Gamakam3 where the note call knows where the end of the note is, I added a feature where an inverting note call sets its end time in the environ, so the gamakam call can be zero duration and still end at the note end time. To address the problem with Gamakam1 where I want to write the swarams and gamakam separately, I added a simple kind of pitch track merging where the second pitch signal just replaces the first one if the samples line up. So I can write the swarams on their own pitch track, and then the gamakam on the next track over can get the previous, current, or next swaram, and if a gamakam is present it will replace the base swaram, and if not present, I can leave the gamakam track empty.

For example, here's notation for the first four notes of Evvari Bodhana:

r   r      g         r s
    !!--2b !P1 !a--1
Ev  va     ri

!!--2b has a rmr jump at the end, and P1 !b--1 starts one step above the swaram, at m, moves down a step, waits, and then moves up a step. The notation for relative movement is 1 2 3 to move up, and a b c to move down. With such a short clip already a lot of problems are apparent. One is that the relative movement is kind of awkward. Another is that I use an extra ! to swith to compact notation where each call is one letter, but it's awkward and ugly when I need more than one letter, e.g. P1. P1 sets the "from" pitch to one step above the base swaram, but it's also quite awkward. I'd much rather not have that distinction, so maybe I should use spaces everywhere... but that's favoring higher level calls since I'd want fewer of them that express more. Also, the piece is in Abhogi, where the rmr motion is idiomatic for ga ascending, but the notation has no way to reflect that knowledge.

I actually implemented an alternate absolute notation which winds up looking more sensible:

r   r      g         r s
    !!--mr !!mg--g
Ev  va     ri

Unfortunately, this basically gets me stuck at the lowest level. I can't write movements that are independent of the pitch, and there's even no point to having the separate swaram track because the gamakam track already specifies everything. But in the long run, it really depends on how much I'm able to abstract. It could be that it's more work to trying to figure out a high level notation is slower than just writing everything directly. In fact, the whole sequencer program is sort of an exercise in finding where that balance lies.

Another problem is that the pitch merging is a hack, and if the gamakam happens to not produce a pitch at the start of the note where the swaram is, it won't replace it and you'll get a strange jump. And speaking of strange jumps, at least in vocal music you can't ever really jump pitch without a corresponding gap. I could presumably impose a slope control to reflect that even something represented as a logical jump in the notation will not necessarily sound that way.

Also I worry that the even division of times will lead to a mechanical sound. Actually, it makes it difficult to express that oscillations are definitely not regular. It also means that I have to explicitly write out each oscillation, so that the idea of gracefully adapting to the duration available is gone. It might be possible to get around that with higher level calls that take up more than one slice, or ones that warp the slice grid in certain patterns. For instance, there is a _ call that causes the previous call to take up another slice, so I can express slower transitions.

Also calls like _ wind up being magic in that they affect the parsing of the whole expression. So it winds up being an ad-hoc kind of language only with no real basic syntax rules, just whatever I can get to parse. But I guess that's always the problem with trying to be terse.

Dynamics is a whole other axis of complication. Gamakam1 didn't address that at all. Gamakam2 integrated them into the calls, so each sub-call would come with its own dynamics. For Gamakam3, I don't know what approach I want, so I wound up implementing two. One is a separate set of calls just for dynamics that is meant to go in the dyn track. It's like a simplified version of the pitch subcalls. The other one is a way to attach dynamics to pitch calls, like so [c-]< means move to the swaram pitch and hold while increasing dynamics.

Separate dynamics is more flexible and orthogonal. It also means I can easily express common things like dynamics up or down at the beginning and end of a note. But I originally added them to the pitch notation because I thought I'd want to line up dynamic changes with pitch changes, and also because it seemed like less work to just edit one thing. However, any time event text becomes even slightly complicated, it becomes a pain to edit. All the editing commands are oriented around track level editing, and as soon as you want to edit event text you're stuck in a tiny text box. If I wanted to do complicated things in text I'd use a music programming language and vim! So a text mini-language should definitely not get too elaborate.

There are also other details I'm missing but don't fully understand, such as intonation. For instance, I believe the ga in mayamalavagoula should be closer to ma, but at the moment I don't have any way to express that. I probably shouldn't try to come up with a way until I understand it better. In fact, in general I probably need to just learn more music before making another go at things. Ultimately the goal is to be able to write down melodies in the Carnatic style, which means I need to have a large set of example melodies to work from, and I need to understand how they are structured. I should also probably take a pitch tracker to some recordings to get an objective picture.

2015-07-17: Args.lookup_next_logical_pitch

Getting the next pitch wound up being quite complicated, but "what is the next pitch" has been a hard problem with no good answer for a long time. In fact, it's another one of those things that has had three separate solutions, each with their own problems. I would up needing yet another one because this one is "within a pitch track, what is the pitch of the next note of my parent note track", which is a question none of the other techniques could answer. It's also the most complicated, in that it adds a whole new derivation mode with special support in the default note deriver and in various other low level places. Basically, it runs a separate derivation in NotePitchQuery mode, and a special hack in Derive.Call.Sub will notice that and strip out everything except the pitch track. Then during inversion this is run on all the neighbor events, and the results put in Derive.Dynamic for anyone to look at. Laziness should ensure that the extra evaluation only happens if someone actually needs to know the pitch in question.

So in theory it's similar to Derive.get_score_duration and Derive.get_real_duration, except of course each of those special modes has it's own ad-hoc set of hacks scattered in the basic deriving machinery. I'm not real happy with the situation, but can't think of a better way.

Thursday, May 15, 2014

TrackDynamics

For the last several days I've been working on one of those cases where you pull on a little root and wind up uprooting half the yard.

The initial problem was a note came out an octave too high through midi thru. It turned out this was due to how I implemented instruments playing in octaves, and getting the instrument (pemade) from one instantiation of the score, while getting the transpose signal from the kantilan instantiation of the score. Cmd.Perf should show a single consistent picture of a block, from only one call.

This led me to investigate how TrackDynamics were collected, which has always been a terrible hack, due to band-aids added during several past debugging sessions. The problem arises when tracks are sliced during inversion. I wind up with a bunch of fragmentary signals, and each note track is evaluated twice, once before and once after inversion. And since inversion means a note track appears twice, which one do I want for TrackDynamics? Well, the one on the bottom, because that one has the scale set. But its controls should come from the one on top, otherwise, it sees a slice of the controls of one particular note.

So you see, all problems all eventually lead back to track slicing. That single feature has been responsible for more headache and bugs and hacks than any other feature, probably by an order of magnitude.

Anyway, TrackDynamics was just the start, because that lead me to become unhappy with how orphan slicing works. The connection was that I needed to evaluate track titles to get the TrackDynamics right, but slicing wanted to strip out empty tracks, so it needed an extra step where it extracted the titles so they could be evaluated separately. And it did that in the special orphan slicing code, but not normal note slicing... or something like that. It also made me unhappy because I wanted to simplify TrackDynamics collection to just take the first one for a track and not do any hairy and order-dependent merging, and orphan extraction was a source of especially unintuitive evaluation order since they were evaluated out of order with the rest of the track. In any case, I thought that I could just make a note track evaluate its subtracks directly if it doesn't have any events, and then just get rid of the empty track stripping and separate title extraction and evaluation mess.

Well, that was the first plan. The problem is that this made the orphan slicing code get increasingly complicated because it then had to handle orphans below multiple levels. It worked out recursively as part of track derivation before, but trying to get a single list of orphan regions to derive meant I had to do it in the slice function, which is the same as how note slicing worked. It became increasingly complicated, and increasingly duplicating note slicing, so I figured I could get rid of the whole thing by having note tracks do a note slice on empty regions, and directly deriving what came out.

That got rid of orphan slicing entirely, which was great... except it led to some further problem I forget, but was related to a totally empty track suddenly turning into a separate slice for each of the notes under it. In any case, I really wanted a plain slice, because I wasn't taking advantage of the separate note slices.

So that was the final iteration: back to a special slice_orphans (I was worried I wouldn't be able to use that name!), but it's just a very simple wrapper around a plain slice.

Aside from all the changes, there were two messes in there. The first was all the various TrackTree fields used to track slice information, like 'tevents_range' and 'tevents_shifted', and 'tevents_end'. Initially slicing just sliced events. But I discovered that I still needed track information that was lost, such as the next event, or the position of the event in TrackTime (e.g. for cache invalidation), and added fields in an ad-hoc way as I needed them. Because they were confusing, I put extensive comments on each one, but reading the comments later I still didn't understand, and existing code seemed to use them in an inconsistent way. Eventually I figured that out and was able to get rid of tevents_range entirely.

The second was overlap detection. This was an endless game of whack-a-mole, fixing one test would break another, fixing the other would break the first. The problem is that I can't keep the whole thing in my head, so all my concentration is used on just solving one problem, and I don't notice when there's some underlying contradiction that makes the whole thing impossible. Trees are always this way, I can't understand anything two dimensional. I drew lots of ASCII diagrams and finally figured out something that seemed to work.

So after all that... I think it was overall a good thing. Slicing is simplified a bit, and it needs all the help it can get. And getting back to TrackDynamics, that was easy enough to solve once I'd gotten the distractions out of the way.

But wait... there's more! This whole mess turned up that gender ngoret didn't work properly in several cases because it couldn't get the pitch of the previous event. That took me down the rabbit hole again, because it annoyed me that something as simple as getting the previous note should be so error-prone... due to slicing, as usual. To make another long story short, I thought up solution A, implemented it halfway, thought up solution B and implemented it most of the way with a feeling of deja vu. Then I stumbled across a large comment explaining solutions A, B and C, and why A and B wouldn't work, so I implemented C. It was even cross referenced in several places to avoid exactly the problem than wound up happening. Part of the problem was that I left part of A in place, for performance reasons, and this let me to entirely forget about the whole thing. Anyway, I eventually decided the reasons B wouldn't work could be worked around, and, after a few bug fixing cycles, finally got it apparently working.

Stepping back, all of this came about trying to figure out how to write pemade and kantilan parts for gender wayang pieces. They are the same, except kantilan is an octave higher. Sounds really simple, right? Well they should be evaluated separately, so variable parts come out differently.

So I need to derive the score twice, once with the >polos = >pemade-umbang | >sangsih = >pemade-isep (abbreviated in score to >p = >p-umbang etc.), and once with >polos = >kantilan-umbang | >sangsih = ... | %t-diatonic = 5. That in turn required me to add the instrument aliasing feature so >x = >y would work. And of course the %t-diatonic line turned up the TrackDynamic problem that kicked this whole thing off. All this just to play the same thing up an octave, and on different instruments!

And I'm not even satisfied with this setup because I wind up with 'pemade' and 'kantilan' blocks that just set various fields and then call the 'score' block. And that's just to get the text out of the track titles, since block titles wrap to display long text while track titles scroll, so they can't accept anything long. It could also go in the event text, which also wraps, but... oh I don't know, maybe it should do that.

This approach is means if I want pemade and kantilan to differ, I have to come up with some conditionals so I can give alternate score sections (also not supported yet, also on the TODO list). An approach based on integration would be to extend score integration to make a copy of an entire hierarchy of blocks. That way I have an "ideal" part, and then pemade and kantilan realizations, which can have individual edits, but still merge changes made to the ideal part. But while score integration already does this for a single block, a whole hierarchy of calls seems like a whole other thing. Or perhaps I can just write a function that sets up a bunch of score integrates.

I remember a time, I don't know how many years ago, in the attic on a Saturday. At that time I had no wife and no girlfriend, and Saturday was free from dawn to midnight. I resolved to spend the whole time working on the sequencer. I was working on the skeleton operations, which are simple graph operations: add an edge, remove an edge, and the like. It was being extremely difficult, and progress was extremely slow, but I was determined to force my way through sheer bloody-mindedness. Perhaps sometimes problems solve themselves when you stop paying attention to them, but in my experience they more often just don't get solved. I thought, I will do this now, and complete it, it will be done and I won't have to come back to it, and so I will focus until it's gone, and then move on. It was really hot, and I thought that in any endeavour there will be times of agonizing slow progress on something that seems so far removed from your goal, and days spent in grinding drudgery. Every journey has moments of despair, perhaps many of them. The only way out is through.

Sunday, October 6, 2013

umbang isep, sangsih polos

I finally finished the wayang samples, which means I need to start getting serious about notation for Balinese-style music. The most basic level of that is dealing with pengumbang and pengisep, and polos / sangsih pairs. Namely, I need notation that takes a single part and splits it into parts, whether unison, kempyung, or kotekan. And of course, every time I think I have a general powerful system that can handle anything, I find out that even the most basic fundamental notation will require adding new features, or further generalizing existing ones.

I initially wanted to implement the "pasang" calls (which is my name for instruments that are actually pairs of instruments) by setting a part=polos or part=sangsih environ val, and letting the default note call apply the appropriate instrument. But then I ran into the core problem which seems to underlie just about all notation problems: order of evaluation. Namely, if I want to apply these calls to arbitrary notation rather than just single notes, they have to be postproc calls. Only then do I have access to the flat stream of notes, and know how fast the concrete tempo is (to know if I can add noltol or not). But at that point it's too late to ask the note call to emit sangsih or polos or both.

So the note call emits the nonexistent "pasang" instrument (e.g. ">wayang"), and a postproc later looks at inst-polos and inst-sangsih env vars to split that into, say ">wayang-umbang" and ">wayang-isep", which are real instruments and have allocation. Now we have yet another order of evaluation problem: by the time the postproc switches the instrument, the pitch is already determined. Pitches delay evaluation and can continue to be affected by controls, but not environ, and the tuning is environ, not a control, so it's too late to change the tuning.

Notes from the TODO file:

Tuning is incorrect if postproc changes the instrument, because the new instrument doesn't bring its tuning along with it. I could fix that by having the instrument bring its environ into scope during conversion as well as derivation.

The easiest way is to move environ to Patch. But this means Derive needs to look up Patch, or at least get a function for it.
But Environ is technically code, since it has PitchControl, which has has a pitch signal. So I can't save it unless I make a RawEnviron out of RawVal. Hassle and complicated.
Wait, environ is no good, because pitch doesn't take environ. It's already baked in by the time the pitch is realized. So I'd have to represent umbang/isep as a signal, e.g. umbang=0 isep=1. It's a bit sketchy because umbang-isep is not additive, while PitchSignal.appy assumes all pitch-affecting signals are additive. Do I want to keep the env var? Try removing for now. But for this to work, inst has to be able to introduce signals. If I added that, it would basically be a one-off for this case.
Now making pitches take environ looks more attractive: it means I can change the key after the fact too, the environ replaces rather than adds, and possibly even fits better with a mythical future unification of environ and signals.
On the other hand, letting the instrument introduce signals could be a useful generalization. Also, it seems like an umbang-isep signal could be a useful generalization, because it lets me explicitly place a pitch in the center. Actually, %ombak=0 already does that. So it's actually kind of non-orthogonal.
Which is better? I'm going with adding Environ to the pitch. Who knows if that is the right choice, but I have to choose something.

Goals for today:

Get this umbang/isep thing sorted out.
Practice mridangam, lesson on Tuesday!
Practice kendang for Semara Dahana to get ready for lesson on Friday.
Sort out samples for mridangam and kendang, and figure out how to automatically infer dynamic levels.

Saturday, September 7, 2013

keyboards

2013-09-07

A common theme of a lot of my posts is me noting how something that seems simple is actually really complicated, followed by a long exposition of how complicated it really is. I guess that's because I'm continually surprised by how much work it is to do anything. Even in haskell, a language which is supposed to conducive to expressing complicated things.

By writing these expositions I'm trying to answer a question that is always in my mind while programming. That is, is music just inherently complicated, or am I failing to see some important abstraction or generalization or even specialization that would simplify things? When I struggle through some never-ending feature-add or bug-fix session I'm wondering what I did wrong, and how I could avoid it next time.

Of course this is one of those perennial questions about programming, about inherent versus incidental complexity. People who are good at what they do, and good languages, are supposed to reduce the latter. I'd like to do that, to become good at what I do, and ultimately save time.

Today I wanted to finish a feature to allow piano and ASCII keyboard input for scales with varying numbers of degrees, then slice up and create a sampler patch for the recorded kendang samples from last week, practice mridangam, and then practice kendang (using the new patch). The topeng dancer is coming to tomorrow's rehearsal and I'd like to practice the cedugan part so as to not embarrass myself too badly.

It's looking like the feature add is not going to get done in time, so I'm putting it on hold while I go to practice. I've actually been working on it for a full week, on and off. It's so hard that it's discouraging, and that slows me down. And it sounds so simple!

The basic problem is that I have input, coming from either a piano-style MIDI keyboard, or the ASCII keyboard mapped to an organ-like layout, and I need to figure out what note in the current scale should come out.

Previously I had a (somewhat) straightforward way. The input key was a Pitch.InputKey, which was just a newtype around an integer, which was the number of semitones from MIDI note 0, otherwise known as the Midi.Key. Then it was up to the scale to map from InputKey to a Pitch.Note.

This worked well enough at first, but when scales got more complicated it started to have problems.

It started with relative scales, e.g. sa ri ga style. I wanted sa to always be at a constant place, so I had C emit sa. That works fine diatonically but runs into trouble when you start using accidentals. If D major is being played starting at C, the black keys are in the wrong place, and you wind up with some notes unplayable. It got worse when I tried to get Bohlen-Pierce to map right. It has 9 diatonic and 13 chromatic steps per "octave" (tritave, actually), so its layout is nothing like a piano keyboard's.

So it seemed like I should treat the ASCII and piano keyboards differently, so I could make the ASCII keyboard always relative, while the piano keyboard would remain absolute as usual. So the InputKey would get its own structure, with octave, pitch class, and accidentals. This involved updating Pitch.InputKey (and renaming Pitch.Input while I was at it), and then redoing how all the scales converted input to symbolic pitches, and that involved a bunch of refactoring for scales, and then fussing around with octave offsets to get them roughly consistent across scales.

The actual conversion is a bit of head-hurting modular arithmetic, to convert from a 7 or 10 key keyboard to an n-key scale, wrapping and aligning octaves appropriately. In theory it's straightforward, but finicky and hard for me to keep straight in my head.

But now it all seems to work, and I can get rid of an awful hack that hardcoded Bohlen-Pierce in favor of all scales adapting based on the number of pitch classes.

Worth it? Maybe? Pitch input is what I spend most time on when actually writing music, so anything that makes that smoother seems worthwhile.

What else? Slow progress lately, since I've been distracted by travel and sampling. Speaking of which, I recently finished the patches for the gender wayang. That's another thing that sort of boggles the mind. Pemade and kantilan, with umbang and isep variations, ten keys on each, with four dynamics and from three to eight variations of each, played with gender panggul and calung panggul, and muted kebyar style and loose gender style. That's 4,313 samples recorded, edited, organized, and mapped to the right key, velocity range, round robin configuration, and envelope. 2.6gb worth, and it took two weekends to record and several weeks on-and-off to do the editing. Even with lots of automation, it's a lot of work, and you need a well defined process and lots of notes to not make mistakes. There are still tweaks to be made, with jagged dynamics to be smoothed and missing articulations to be rerecorded. I guess that's another thing that's complicated about music.

And I'm not done yet, I still need to do the gender rambat and reyong / trompong, both of which have more keys.