Saturday, December 29, 2012

Lag, drag, and memory corruption.


I'm definitely not loving laziness at the moment.  That's because I've spent the last two days trying to track down excessive amounts of DRAG in score derivation.  Not only are they excessive, but they grow more and more on each derivation.  That shouldn't happen because each derivation is independent and clears out the old one, but clearly something is being retained.

Part of the problem is that since I wasn't sure where the problem was, I hesitated to spend time creating an automatic reproduction, since I've done that sort of thing in the past only to find out the problem is related to the specific way that the GUI interaction handles the data.  This is a symptom of laziness space leaks being non-local.  And since debugging is basically trial and error, it took a long time to change something, recompile, run again, poke a bit, quit, and inspect the heap dump (though the ekg library helps).  Once I figured out what I had to force to get the DRAG to go away, I spent a lot of time on a kind of binary search either forcing bits of data or clearing it, to narrow down which bits did the trick.  It's complicated because frequently it's more than one thing, i.e. more than one bit of output is retaining the data.

The results aren't simple either.  For example making the cache strict would take about 5 MB off the growth... but I had to touch the UI for that to happen, so it's clearly being hidden behind another thunk.  And then 5 more MB would be cleared up when I ran another derive, but forcing everything would clear it all up immediately.

There are numerous clues, if only I knew how to interpret them, I took to writing them down because I can't remember them all at once:
  • Clearing both TrackDynamic and Cache gets rid of it, but only after one derive.  It's retained elsewhere, but I don't know where.
  • It's got something to do with perf vs. current perf, since clearing current perf doesn't do anything, but clearing perf does.  They should both point to the same data, so it should be both or nothing, right?
  • Forcing the cache gets rid of just one derive's worth of drag.
  • Drag is mostly Map and [].
  • If it grows by 15mb each time, force gets rid of 10mb.  5mb of permament growth is hiding somewhere.
  • Clearing the performance entirely gets rid of the last 5mb, but it still requires a derive or force to get it to happen.  Where is that reference hiding?
  • The first run has no DRAG, it only appears on the second run.  Clearly this is related to caching.
  • The VOID grows slowly, and it's mostly [].  Actually it looks like the [] is constant, and the growing part is misc other things like Double.
  • There's almost no LAG, except right at the beginning.  So that also points to how the cache is retained across derivations.
Of course, forcing is just to figure out who is dragging.  It's much better to not drag in the first place, by making the relevant data strict.  Just about all of my data structures are strict, but of course built-in ones like lists, Maybes, and Maps aren't.

I eventually figured out that the continually growing part was actually buried in PitchSignals, which are the only bit of the Score.Event which is code, not data.  That means, of course, rnf doesn't work on it.  It turns out they capture the environ in a closure.  While the environ itself is small, it's clearly not entirely evaluated, because rnf'ing the environ before creating the closure wiped out the DRAG.  The environ is basically a Map, but despite my attempts to make insertion strict, the only thing that's worked is rnf'ing the environ before each closure is created.  Maybe I can use ghc-heap-view to see which parts of the environ are unevaluated.  But anyway, that explains the the retains-forever thing, since the still-valid parts of the cache are copied to the new cache.

I've given up on the idea of lazy derivation, since it's not working out in practice, and I want to parallelize derivation.  But I feel like the work making derivation lazy wasn't totally a waste, because it was basically eliminating data dependencies, and that's a prerequisite for parallelization as well.

If my usual practice is to strictify every data structure I use, and I have all sorts of problems due to built-in data structures being lazy, wouldn't it be easier if everything was strict to begin with?  I still use laziness for Perform, and I think it's useful there, but it's just a stream.  In a default-strict language maybe I could just mark that one thing lazy and not have to deal with all this mess in the first place.

Currently I think I have three major leaks:
  • Too lazy environ captured in PitchSignal closures.  Workaround is rnf'ing the environ, but hopefully is fixable by making environ more strict.
  • Events in saved in the cache are too lazy.  The cache filters out log msgs about caching, since getting cached log msgs about caching from the previous cache is just really confusing.  The filter, being lazy, probably retains all the events and their referents.  The mysterious part is that the next run should use or toss the cache, which should clear the references either way, but doesn't.  In any case, I can work around by rnf'ing the cached events, but I don't know what the fix would be.  Making the entire derive output be a strict list?  Or maybe its the events themselves that need to be stricter?
  • Something else unknown in the performance.  I infer its presence because given about 15mb of growth after one derivation, forcing the cache clears about 5mb, forcing the environ clears about 5mb, but forcing the entire performance clears it all.
Of course, capturing too much in a closure can happen in strict languages too, but it's so much easier to do so when any data structure could actually be a thunk that retains who knows how much data from who knows how far away.


The other major problem is nastier, and I don't even have a good avenue of attack.  Ever since upgrading to ghc 7.6.1, I get internal errors from the GC, which probably means memory corruption.  It happens most frequently in the optimized build, and seemingly more frequently when it's using the libgit2 save format.  My first suspicioun is FFI code.

It also happens in the debug build.  It happens when I'm not using libgit2.  It happens when I'm using the stub MIDI driver.  The one time it doesn't happen is during tests.  I guessed it's in the GUI binding, but it also doesn't happen when I set up a script to drive the GUI.  Well, it's driving it via a socket, not with keyboard and mouse, so maybe it's in keyboard / mouse event stuff.  The fact that it only happens after an unpredictable amount of interactive use really makes it hard to take the trial-and-error approach that has been so slow on the memory leak problem.

As far as I know, this bug has been present for years, since long ago I had the same symptoms and "fixed" them by changing an alignment=1 Storable to alignment=4, which made no sense, since alignment=1 was correct.  But increasing the alignment could have just masked the overrun.

Valgrind shows nothing.  I have no way to reproduce it that's faster than 15 minutes of normal usage.  But I can't use a program that crashes every 15 minutes.

Saturday, December 1, 2012


Not related to karya, but I wrote all this stuff a long time ago when Prometheus first came out.  Bad Prometheus reviews have since become a kind of genre, so here's my entry into the genre.  Naturally this will spoil the whole movie if you haven't already seen it.

The expedition was ridiculously disorganized and haphazard considering they're walking into some ancient alien complex.

Biologist guys decide "we're going back all alone" because they're scared about weird bodies and stuff.  Yes, let's go back alone so we can get lost.  Even on a normal expedition on Earth I don't think you get two guys throwing a fit and deciding to wander back alone through a maze.  I guess they forgot about the fancy automappers, because they got lost anyway.  I also think you've kind of failed as a storyteller if you need to the characters to tell us what implausible thing happened as their first line: "oh, looks like we're still here because we got lost".  I guess they're fulfilling their Guys Who Get Killed First duty.

Meanwhile, our haphazard other heroes are bumping into things, knocking things over, shouting at each other, and not paying any attention to the android who can mysteriously operate alien machinery.

And someone drops her bag, and decides to dive back into the raging knife storm after it.  Even if it had something they haphazardly ripped off from the complex, and there's like a million cylinders and alien bodies and whatnot in there.  A dropped knapsack is totally worth a knife bath, right?

Then there's this thing where they're all mopey like "oh we didn't find anything, it's all empty, let's give up and go home" when they spent 15 minutes inside a tiny part of a giant complex and saw huge amounts of crazy stuff and alien writing and working machinery and had to run back early because the weather got bad.

There was a strange scene I forgot about with the head.  If you retrieve the head of an ancient alien corpse I guess the first thing you want to do is pump it full of adrenaline, which I guess makes alien heads relive their last moments.  So it will tell you... what it was running from maybe?  Who knows, because they accidentally put in the chemical that makes alien heads explode.  And then they're like shrug, so much for that alien head, and forgot about it entirely, which is what I did too.

And then the biologists (geologists? hard to tell since they didn't seem to do any of either) are stuck in a deeply creepy alien complex and the captain on the ship is just laughing at them.  "Yeah, there's something moving down there, along with all the horribly murdered alien corpses.  Sweet dreams!"  It's like they all knew they were in a movie and these guys were the Guys Who Get Killed First, so the captain is just winking at the audience.

But then the strangest thing is that these guys, after doing stupid things because they're scared, and then getting terrorized more by the captain, and running in the opposite direction from the "something moving", then when they actually see a scary alien they decide it's cute and they want to pet it.  Even after it hisses and makes all kinds of threatening gestures they just want to pick it up.  Aw, the alien snakebeast is hissing, I think it wants to be friends.  Kane wasn't that stupid.

I've never seen a C-section, but I don't think that's how it works.  Vickers' surgery table is mysteriously incompatible with women (I guess it's for the robot then?  But he can amputate his entire body and still be just fine.), but it does have a convenient little organ removal claw.  It also has a convenient button on the outside that floods the inside with poisonous gas.  Forget the critters, when you think about it the operating table is a horror movie all by itself.

Oh and before that she hits one of the doctors / orderlies (who I guess are trying to help her get exploded by an alien? when did they turn into psychopaths?) and both of them apparently decide not to chase her.  Or raise an alarm.  Or do anything else for the rest of the movie.  It's not so much that they forgot to chase her, it's that the scriptwriter forgot they existed.  Then she finishes up with her ordeal and it's like there's a tacit agreement to not mention the attempted murder-by-alien.  When Burke did that in Aliens it took a power failure and alien attack to distract them from killing him on the spot.

Anyway, they get to the Engineer and they're like "let me talk", "no let *me* talk", "me talk me talk me talk!" and hitting each other.  And the Engineer decides the first thing he should do after sleeping through his entire installation being wiped out, is to beat up the critters that woke him up (I understand, they *are* acting like 5 year-olds), and then go find their planet and presumably rain horrible black biological death on it.

Then the captain says "Hey guys, I think we'll have to crash into that ship and kill ourselves.  Because it's taking off, and we can't let alien ships take off.  And I have to be standing here while it happens."  Other guys: "Oh.  Ok.  Sounds like we should stand here too."

The crew in Alien got wiped out, but at least they were trying.

Sunday, September 2, 2012


Since I'm wanting to come up with notation for kendang and whatnot I've started to actually get around to sampling, and making instruments out of the samples I've already made.

I posted this on google+:

Sampling is such mind-numbing work.

First decide on how many dynamic levels, how many articulations, how many round-robin variations, how many pitches, and multiply all those together.  That's how many samples.  Then record them all, being as consistent as possible, plus a few variations to give choice when editing.  The slightest noise can ruin the sample.  Try not to swallow or breathe too loudly.  Unplug the refrigerator and wait for airplanes to pass.

Then proceed to snip up all 300-500 samples.  It should be somewhat automate-able, but it's not.  Get sore wrists.  Snip, snip, snip.  Try to pick consistent variations, but not so consistent that there's no point to having the variations.  The annoyance of every bug or UI misdesign in the editor's "inner loop" is multiplied by 1000 repetitions.  Be completely consistent about ordering and naming or samples will get mixed up or lost.

Then lay them out in the sampler, which is almost completely manual, and intentionally designed to resemble a cramped LCD screen from the '90s.  Get the dynamics smooth as possible.  Enjoy the bugs here too, evidently they figure not many people are creating their own samples so they don't need to polish that part so much.

Finally, find out if it actually works for music!  If you consistently played some articulation oddly then stringing them together sounds bizarre and stiff, like an automated subway announcement.  Maybe start over from the beginning, or redo bits, being careful to recreate the original recording situation as much as possible.

And this is percussion, probably the easiest thing to sample.  I have a lot of respect for the people who do this professionally.

update: integration, git, theory, tracknum

I've been practicing kendang in earnest lately, and so on the sequencer side here's kendang tunggal notation, along with composite notation with automatic realization, and how that gets integrated back into score.  The idea is that you'd then go edit the individual parts, becaue the composite notation doesn't express all the details of what wadon and lanang really do.

Work has slowed lately since I've been busy with other things, but there has been some progress.

I've reached one of those points where I'm tired of features, so it's time to work on some more music, and see what features (and bug fixes of course) fall out of that.


One of the larger reasons for business was going to Bali to perform in the PKB.  The performance, or rather our integration with it I'd say was only partially successful, but it did imply a sort of trial by fire for the sequnecer.  I originally wanted to see about writing for the gamelan, but it turned out to be more practical to write for the Chinese instruments.  Partially because there were fewer of them, but also because they're used to sight reading, and and it's hard to beat sight-reading when rehearsal time is limited.

Since I naturally wanted to use Karya for writing the music, it entailed a lot of work on the lilypond backend.  Writing to a deadline also enforces an economy of effort, which is reflected in the fact that at present the lilypond backend works ok for a single score, but is definitely not able to typeset anytihng not written specifically with it in mind.  For a few days I would get up early, work on adding a feature so I could print the score out around noon, so we could then rehearse before going over to Gusti's to practice his stuff.

Since I'd like to write mixed acoustic and electronic music, a lilypond backend will be useful in general, so the effort was definitely not wasted.  Also, I'd like to use Karya to write as nice an electronic version as possible, while still preserving the possibility of live performance by printing out a nicely typeset 5-line score.

It's essentially a problem of conversion from a notation at one level of abstraction to another one at a different level, and like all such translation there's no complete solution, but I believe I should be able to get pretty nice lilypond eventually.  Since I need to render e.g. trills to a special lilypond symbol rather than a sequence of notes, I need a seperate derivation and for ornaments to be able to know about lilypond and emit a lilypond annotation when they are in lilypond mode.  I also need to be able to annotate the Karya score with things staff notation cares about, like left and right hand for piano parts.  Attributes should be able to express that kind of thing.

Hand-editing the generated lilypond output would be pretty untenable, so I also have to be able to embed bits of lilypond so I can keep things like layout tweaks and page breaks in the Karya score.

Once completed I think it should be possible to wind up with a fairly nice lilypond score creating environment as well as a sequencer.

Anyway, enough new things, here's how the various themes from previous post have fared:


I've been working on this one lately.  I have a basically working setup and design, though there are a number of known problems and I have the feeling it's the sort of thing I'll be picking bugs out of for a long time.  I've got to stop adding these kinds of features!

Integration comes in two flavors: block integration takes the output of a block and creates a new block from it, and track integration takes the output of a track and creates new tracks in the same block.  I'm guessing that I'll use track integration more because it's more convenient to have the source track right next to the destination tracks when editing.

The complicated part of integration is merging.  Namely, I want to be able to edit the destination track, then edit the source track, and have the integrated changes merge with the manual changes on the destination.  So what I need is to turn the destination into a set of edits that describe how it has been edited manually.  When the source changes, I recreate its events and reapply the manual edits.

After thinking about various approaches, I decided that I could save the integration with the block.  When there is a new integration (i.e. the source has changed), I diff the current score against the previous integrated score to derive the edits, and then apply those edits to the new integrated output.  I call the saved integration an "index", because it maps the event's stack (i.e. the stack of the generating score event) to the generated UI event.

The edits can express moving an event, changing its duration, replacing the text, or adding a prefix to the text.  So if you derive some notes, then make them longer and apply an ornament to them, then change the source in a way that their text changes (e.g. change the pitch), they should get the new text, but have the modified duration and prepend the ornament.

I went down a bit of a dead-end trying to use the saved integration to determine the tracknum.  The idea was that the integration could decide to emit the "same" events on a different track, but since the index uses the stacks of the generating event I can track them down and keep them on a consistent destination track.  Unfortunately this ran into trouble because a new event on the source track doesn't exist in the index, so it still needs to come up with a tracknum for it.  It turned out that just given the stack this isn't really possible, since one score event will wind up generating several tracks of UI events, since pitch and each control is a separate track at the UI level.

So I switched tactics---instead of diffing the entire integrate output, I first match up the tracks and diff the individual tracks.  I can match up the control tracks by name.  Since controls are indexed by name in the score event, a score event will never integrate to two control tracks with the same name.  That leaves matching up the note tracks, which I don't have a satisfactory answer for.  I just match them in order.  The effect will be that if a deriver decides to create more note tracks (e.g. it has more simultaneous notes) and if a new track winds up before other ones, they'll get bumped over by one, not line up with the destination tracks anymore, and all of the generated events on the destination track will be mistaken for manually added events, none of the generated events in the integration will match up with the destination, and Chaos will Ensue.

As a result, the index is no longer keyed on the whole stack of the source event, but simply its position in the track.  It's simpler but if the tracks get lined up wrong easy leads to false positives.  I'll have to see from practice whether this is a problem or not.  I don't need to completely solve the merge problem (which is not possible anyway), I only need to adequately solve the subsets that are likely to turn up in the course of the way I'm likely to use integration.  It's probably possible to have the deriver include more information and get a more reliable way to match up tracks, or maybe some technique to ensure the deriver out is "stable" as far as new tracks goes (i.e. they should always be added at the end), but I'll wait to see if there's actually a problem in practice before trying to get even more fancy.

For some reason the complexity of three inputs---diffing one against the other so I can apply the results to the third---proved to be just a little too much for my tiny brain, and needed real concentration to not forget what I'm doing and what input is what.  I guess I know my limits.

One essential thing is coming up with distinct names for each concept, which I guess is one of those Hard Things about programming.  I eventually decided that "integrate" is the verb for the whole process, "integrated" is the output from derivation which is just score events plus some additional bits, "convert" is the process of converting score events to UI ones and splitting tracks, "source" is the block or track that is being integrated from, and "destination" is the block or track that is being integrated to.  It's mostly ok, though "integrateds" is a really awkward variable name, and "destinations" winds up being shortened to "dests".

Integration is analogous to a macro system like Template Haskell, and like TH it requires staged compilation.  So I wind up with a derivation that results in changes, which should result in further derivation.  Since derivation is triggered by score damage, it's easy to get this wrong and wind up with an endless deriving loop.  I originally debated either building the derive -> integrate -> derive cycle into derive (which is how ghc does TH), or keeping integration out at the cmd level and bouncing between Cmd and Derive several times.  I decided on the latter, because all of the facilities for applying changes to a score, turning those changes into damage and firing off the necessary derivations all exist at the Cmd level and it seemed complicated to try to embed them in Derive.

The original plan still required special support in Derive for integration, in that the first time you run it, it derives just the integrate sources with some specific tweaks like omiting tempo tracks, and the output events go to integration rather than the performance.  I eventually figured out I could eliminate all this by collecting integrated events in the Collect state and having the Cmd level integrate look for performances with non-null integrated output.  So you set up integration by just sticking a normal derive transformer in some spot to catch events, and all the work of integration happens at the Cmd level when it notices a derivation includes integrated output.

It's worth noting that one source block or track can have multiple destinations.  So you can e.g. one core melody reflected into several different parts, each of which differ from it in different ways.  It's also theoretically possible to stack the integrations, e.g. a source creates a destination, which you can manually edit, which is then integrated again to create the final destination.  So it's a fairly powerful technique.

    Git Save

It's all hooked up into undo.  So I can load a project at a certain point in its history (available through the 'GIT_DIR=save/file.git git log') and then go back or forward in time.  If I undo and make new changes, the old ones aren't lost, I simply have a branching history.  Redo will just pick the last one I used, but when loading from the cmdline I can give the git hash and load up that specific branch.  These branches are not branches as understood by git-branch since I manage the tags and history myself, and I don't even have a working directory, but at least gitk seems to be able to display them correctly.  I haven't used it much and will no doubt need further polishing, but the basics are there.

The thing that makes me nervous about this is the potential for errors.  Within the app itself I don't worry about change deltas except where I have to.  All cmds directly modify the state and when deltas are needed (to either update the UI or determine what needs to be rederived), a diff is computed on the spot for that particular purpose.  This probably hurts some performance-wise over recording all edits as diffs directly, but I have a feeling it's saved me from lots of buggy hassle.  I have this feeling because I actually do record diffs for event updates, because I don't want to compare every single event to look for diffs, and those updates have been a big hassle with e.g. undo and redo.

The thing is, git save winds up being deltas again, since I don't want to save the entire score every time someone adds a single note.  The same goes for loading old states, undo wouldn't be very usable if every tiny undo required reloading and rederiving everytihng.  But this means when I save a git state it's vitally important that the code that computes the deltas to commit is working with the same previous state as the one denotated by "parent commit" hash entry in the commit.  Otherwise, I'll happily and quietly save deltas that will not take you from the previous state to the next one accurately, and you'll only find out when you try to undo or redo past that bit.  Save files are the most critical for correctness, so they shouldn't be the most dangerous!

I've fixed a few bugs, but at one point I remember seeing a complaint in the logs about wanting to dump state for a view which had been closed, so the bugs aren't fully worked out yet.

That actually indicates a further complication with undo: it's confusing when undo open and closes windows, so while I definitely need to store the window state so I can restore it, undo should probably carry over the current window state, unless it has to close one because the underlying block disappeared.  So if you undo back to a previous state, you're not at the same state you used to be at because the windows are different.  Then if you make changes and it wants to save them as diffs against the previous state, but since the real state and the state on disk differ maybe you get bogus diffs saved to disk.  Ugh.

Given that I like the feature and want to keep it, how can I make myself trust it?  The first thing is that I still save the complete state on explicit checkpoints.  So if things are screwed up, at least I still have what I had without the feature, which is plain old save like most apps have.  I even have a bit better, since I keep all the explicit checkpoints, so it's like one of those systems that backs up the old save instead of overwriting.

Secondly, it seems like I should be able to take the quickcheck approach and throw a giant pile of automatically generated changes at it, then verify that loading the initial state and redoing until the end will wind up with the same final state.  Of course since undo has the ability to mess this up, it needs to include undoes as well.  This is not totally trivial because I have to drive it through the responder loop to ensure that the tests are doing exactly the same thing as a real user, and I'm not totally clear how to make sure I'm getting a broad sampling in the state space, but it should be possible.

Thirdly, of course the most desirable approach is to eliminate or simplify the error-prone parts.  The underlying problem is that I have a state in memory and a state on disk, the state in memory changes so I apply diffs to the state on disk, trusting that firstly the diffs result in the final memory state and secondly that the first disk state is equal to the first memory state.  I could eliminate both at once by saving the entire state, but I think it would be too slow.  Other than that, I can't think of a way to eliminate one or the other separately, but maybe I'll think of something later.

One other little detail is that Git has support for diff compression, but I'm pretty sure it only works for text.  Initially I was worried about the overhead of saving a whole track every time an event on it changed, so I implemented a scheme for saving events in individual files.  But when I actually tested it, it was actually slower than just dumping the whole track, and it was a bit complicated, so I left it out.  Originally the problem was worse because I having to save the whole track on every single keystroke, but I decided I should come up with a wait ot make undo coarser-grained.  To that end, there's a concept of "suppressed" cmds.  These are cmds that, if a bunch of them happen in a row, are collapsed into a single undo action.  The intent was to collapse a series of "add a letter" cmds in raw edit mode.  That wound up being too unpredictable since an intervening cmd would split the undo, so the current implementation suppresses until the edit mode changes.  This is analogous to vim, which waits until you leave insert mode to record an undo.  I'm not totally sure I like it because it's easy to make a lot of separate changes without changing input mode and then an undo undoes too much.

This also added quite a bit of buggy complexity because it has to save a "pre-commit" state, and when it becomes unsuppressed, save both the suppressed undo and the current one.  Part of the bugginess was that undo, like all time travel, is confusing.

It also meant I wound up needing names for cmds, which I had wanted to do anyway.  One thing leads to another.

There seems to be an overall tension between thinking about states "functionally", simply as a series of states, and thinking about them "imperatively", as a series of transformations.  The functional way is much less error-prone, but I always wind up wanting the transformations back out.  This happened with integrate merge as well.  A kind of consolation is that each time I want a different kind of diff.  So perhaps the decision to store state functionally and derive the diffs as needed was the correct one.


Not Theory3 anymore since I finally stopped doing rewrites and got rid of the old hacky Derive.Scale.Theory.  It turns out getting sharps and flats and enharmonics right in a generalized system is complicated!  I say generalized because I wanted to use the same system to support standard church modes along with more exotic scales like hijaz (exotic because it uses sharps and flats in the same key signature) or octatonic (exotic because with eight scale degrees, there is no 1:1 correspondence between scale degrees and note letters), and yet more exotic scales on non-piano layouts like octatonic spelled A-H or Bohlen-Pierce, which can have 9 note modes of a 13 note scale that repeats at a twelfth, not an octave.  Wacky stuff.  Hipster Composer says just scales that repeat at the octave are too mainstream.

Actually it turned out "plain" octatonic or whole tone was the most problematic because the scale degrees don't line up with letters, so it's hard to figure out how spelling should work.

I wound up implementing a Just scale as well, that can be retuned based on a base frequency signal.  But since it's diatonic at the moment I didn't wind up using any of the Theory stuff, so now I have three ways to implement scales: Derive.Scale.Util for simple diatonic scales tuned to absolute but non-repeating frequencies, Derive.Scale.TwelveUtil for complex scales that use letters and keys and chromatics and enharmonics, and Derive.Scale.Just which uses letters, but is diatonic, and figures out frequencies on the fly based on key and a set of ratios.  It seems like it would be nice to have a more unified situation here.

    TrackNum vs. TrackId

Rather than following through with the headache of trying to eliminate all TrackIds, I came up with a much easier and practical idea.  Actually, I was coming up with reasons to have the same TrackId on different blocks, namely that they could be used to present a certain view of a score.  For instance, in a large score with many instruments you might want to put the winds and strings together and edit them alone.  If the block contains the same tracks from the complete score, any edits are reflected in both places automatically.  I'm not totally sure this will actually turn out to be a practical way of doing things, but it doesn't seem totally far-fetched that the ability could be useful someday.

Since TrackNum is only ambiguous if the same TrackId occurs more than once in a single block, I can just ban that specific case, but otherwise leave TrackIds alone.

So the easy solution was to add a precondition to the UI State, and enforce it in the function that adds a new track.  Done.  Now I can convert TrackIds to TrackNums and the awkwardness of two non-interchangeable forms of address is mostly gone.

That brought to my attention that State has a number of invariants but no particularly strong protection for them.  If you use only the monadic functions you'll probably maintain them, but you can also go directly modify the State and break them all you want.  I decided to get a little more principled by only exporting the higher level functions, which basically means not exporting State.put.

Unfortunately, that theory immediately started to show its holes.  Firstly, loading a saved state needs State.put.  But I can export put as long as I make it check the invariants.  So it's now an expensive operation, but is at least available.  But then what about undo?  It *should* be safe, since the states have all occured in the past, so they're probably not just random garbage, and undo shouldn't have to do a full consistency check each time you want to undo a single note insert.  So I added 'unsafe_put' which doesn't do the check.

I already had to export the unsafe put because I discovered that if Cmd.M is to implement State.M so I can use State functions unlifted, it needs to implement the State methods, namely 'put'.  So either I cram the 1000 line Cmd module into the 1500 State module (fat chance) or just export unsafe_put.  I guess it's just part of monads not really being modular.

I actually only implemented part of the verification and left the rest for later.  I'm not really convinced it's a big deal.  It doesn't have to be impossible to make a mistake, it just has to not be easy to make one.  I'm protecting myself against mistakes, not against someone intentionally trying to break things.

    Memory Hog

Yaa... probably still hoggy.  I did a few half-hearted heap traces and the heap showed reasonable numbers so maybe the 'ps' numbers are way more than actual data.  I think there's at least two times for copying GC overhead, and then all kinds of libraries and maybe pinned memory that's not counted by the heap profiler.  I'll wait until I can upgrade to ghc 7.6 and its better profiling before fussing with this again.

I'm writing this at 法鼓山 while Hui-ju has a meeting.  A few years ago I was here also, writing out the plan for how to implement caching.  It seems long ago and I suppose it was, since then we've been a year and a half married.  I should find the original cache plan and put it on the blog at some point.

Wednesday, May 2, 2012


A lot of interesting things have been happening lately.  I've been putting off writing about them because, well, all the time spent writing blog stuff is time I could have been writing code.


One of the most exciting is that I "finished" the first piece of music to come out of karya.  It actually doesn't totally count, since it's more like "re-edited", since I started with "bloom", which I wrote a long time ago on the Amiga, and wrote a little program to dump out the old OctaMED save and load it into karya.  Since that point it's been the testbed for score features, and motivation for new ones.  It inherited the block structure and all the notes from the old tracker piece, so I still don't have many tools for dealing with larger scale score structure.  It also doesn't give me much feel for how I should structure a score given that karya is not subject to the tracker two level structure of a list of blocks.  It turns out that structure is convenient for caching, so I'll probably need something like it, but I still have a lot of unused flexibility in the layout of the piece.

Also, it's not really finished because there are some things I'd like to add, but they're mostly for experimental purposes rather than musical ones.  Of course, I hope that technical experimentation will always be a source of musical inspiration, so presumably this will always be true.  One thing is more experimentation with expressive wind melodies.  The other is that I noticed that one section sounded interesting backwards.  Incidentally, this was thanks to step play being able to go backwards.  It's interesting how some features lend themselves to serendipitous discovery.  In this sense, as well as its intended purpose of sounding out music at my own speed, step play has been worth the time put into implementing it.


Anyway, one problem with step play is since it's a MIDI-level mechanism it's not really that easy to turn it back into score.  Initially I wrote a UI event transformer Cmd to reverse a bit of score, but ran into problems associating the controls with each note.  I realized I was, once again, redoing work that the deriver already does.  I have already put quite a bit of work into a vocabulary of ways to transform notes at the deriver level, and one thing I'd always planned on doing was to come up with a way to bounce a derivation back up to the score.  So you can choose whether you'd like to implement a particular effect by writing out the expressions to make it happen, as programming languages do, or transform the source directly, as sequencers tend to do.  The advantage of the first is that you retain the original data and the structure of the music is apparent in a high level way in the score, and the advantage of the second is that you can continue to edit the score.  In a way, abstraction in programming languages forces you to pick a point in the abstraction space and makes it hard to reach down below that, but in music you often want to reach across those boundaries (e.g. call this function 15 times, but on the 8th time add 3 to that value instead of 2).  In a programming language, this turns into a configuration problem: either every function needs an ad-hoc growing list of parameters, or there's some kind of dynamic environment or implicit parameters, or maybe inheritance and overriding as in the OO world.  Actually, the "dynamic environment" is exactly what the deriver already does to try to attack these problems, but there are still plenty of cases where the change is too ad-hoc.  Reflecting the derived notes back up to the score is another angle of attack on that problem, and I realized all I really needed was a function from score events to UI events.  It was surprisingly easy to write, just about 200 lines.  I decided to call it "integration" because it's the reverse of derivation.  I originally intended to give that name to recording MIDI -> score, but "record" is a more conventional name for that more conventional concept.

I also realized I could sort of have my cake and eat it too by setting some blocks to automatically integrate the output of another block, and with a merge function, I can figure out which events have been edited or are new, and try to merge the newly integrated output back in.  So I should be able to have one part be a derivation of another part, edit the derivation a bit, then edit the source block, and have it merge my changes back in to the integrated result.  I can use an event style to mark derived notes, and detect deleted notes, moved notes, and perhaps even transposed ones.  The capabilities of the merge function may need to vary by score.  This is a feature I had planned on adding from the very beginning, but was putting it off for a distant future version.  But now it seems like it shouldn't actually be that difficult to implement.

The next step is integrate output into the same block, which presumably involves fancier merging.  This has the obvious applications with fugues and canons, in addition to doubled parts, sekaran, imbal, and all manner of parts derived from other parts.

    Git Save

Another feature I intended to implement from way back is incremental saving.  The idea is to continually save each modification, so that explicit saving is unnecessary (though you can checkpoint to mark specific spots).  This means that the undo history is saved along with the rest of the score.

I originally intended to save a simple numbered series of files representing each update, and made some changes to the Updates to support this.  But I realized I'd have to explicitly delete the future if I did edits after an undo, and why shouldn't I preserve it instead?  That led to the idea of using a hash of the contents of each update so they won't conflict with each other, and provide some assurance that you don't apply them out of order, and that in turn started to sound a bit like git.  It turns out that git is actually a general purpose object store with a source control system built on top, so I looked into using git instead of inventing my own system.

So I have a partial implementation, it can save and load complete states as well as checkpoints, but is not yet hooked into the undo / redo system.  It's a bit awkward and quite slow to interact with git by sending tons of shell commands, but it works for the moment.  There's a library version of git called libgit2 which I can bind to once this is working.  In return, I get git's ability to pack up lots of small files efficiently, add references to mark particular checkpoints, and gc away unreferenced ones.  I should also be able to use graphical git tools to visualize the history and branches of a score.  I don't think I'll be able to support merges, or have any use for branches, but since I use the object store directly (there's no "working set" in the filesystem) I'm not sure those features make sense.

Hopefully I can finish this up, test it thoroughly, and never have to deal with losing data due to a crash again.  Not that crashes happen very much, this is haskell!


Since I now support diatonics and enharmonics I have to understand scales in more detail than I previously did.  Specifically I need to know about the key, for scales that have them.  This is used for not only diatonic transposition, but also symbolic transposition, both chromatic and diatonic, and choosing the most appropriate enharmonic given an input note.

This is well defined for the church modes, but leads to some interesting questions when we start pushing the boundaries of the tonal system.  Specifically, how should non-seven-tone scales work?  The simplest examples are octatonic and whole tone.  In the case of octatonic, the letters no longer indicate the scale degree because one will have to repeat.  That means the usual definition of diatonic transposition, which is to increase the scale degree, can't just increment the note letter.  Then there's the question of spelling.  If I take the Messiaen approach and say there are only three octatonic scales then I can simply hardcode the spelling for each one.  But in practice when I write in octatonic I'm still thinking tonally, so there should be a key.  And the key must fall within the scale, so if I'm writing in Db octatonic then I expect that Db should come out when I hit the C# key!  So I settled on two octatonic scales, octa21 and octa12, multiplied by each chromatic pitch class.

The simpler approach is to adjust the system to accomodate these scales naturally, e.g. by spelling octatonic A-H and whole tone  A-F, etc.  This leads to much simpler notation for things which are purely in that scale, but would lead to difficulties when modulating between, say, octatonic and a church mode, or writing something which is somewhere in between the two.

Even as the weakness of the A-G twelve tone system is that it poorly expresses the specific mode you are writing in, that same aspect is a strength in that you need not commit fully to any particular mode.  So it exists at a peculiar point of tension between tonal style spelling and giving up and writing in a purely 12-tone style and choosing enharmonics purely melodically.

All this will be especially important for non-12-TET scales, where enharmonics will actually be different frequencies.  I'm still not clear on how a 12-note scale can coexist with just intonation, but there's a company called h-pi that makes such keyboards.  When the time comes I should put in the time to understand their system better.

For some reason it's taken a long time and 3 iterations to get this right, and it's still not done yet, but it will be necessary eventually.

    TrackNum vs. TrackId

This is an internal cleanup headache.  I use BlockIds instead of directly talking about Blocks so that I can have multiple Views backed by the same block.  I originally thought it may be useful to do the same thing with tracks, so blocks actually have TrackIds instead of Tracks, and theorotically a track could appear in more than one block, or more than once in one block.  I say theoretically because now there are two ways to refer to a track: either by its TrackId, or by a (BlockId, TrackNum) pair.  When referering to the events I should use the TrackId so e.g. I only transform the events of a track once even if it appears in a block twice, and when referring to the track in context it must by (BlockId, TrackNum).  Unfortunately I got that wrong a lot and as a result a track appearing in a block twice will probably not work right.  For instance, the event stack uses the TrackId, not the TrackNum, which means that signal output and Environ are saved by TrackId.  But since the two tracks appear in different contexts they may very well render to different signals, and will likely have different Environs.

Also TrackIds add some complexity.  I have to come up with names for them and worry if those names should be meaningful, GC away orphaned tracks that no longer appear in any block, and deal with all the functions that give a TrackId when they should have given a TrackNum (you can't sort the tracks by their order in the score if all you have is a TrackId, and you're not guaranteed to be able to get a TrackNum from it because of this multiple tracks in a block thing).

So I want to scrap the whole idea of the TrackId.  Unfortunately, it has its tentacles in deep and sometimes it's very convenient to be able to talk about a track without reference to a block.  I have a State.Track type which is a (BlockId, TrackNum) pair, but that's turned out to be rarely used because in practice I often want to separate out the BlockId and do different things with it.

Every time I come across a case where TrackId is incorrect or error-prone it goes onto the list of reasons to switch.  Eventually it'll get heavy enough that I'll be motivated to go through the hassle and breakage.

    Memory  Hog

At one point I laughed at web browsers for their memory hogging nature.  That was back in the days of Netscape Navigator.  Well, the laugh's on me now, because I recently noticed the sequencer hanging out at 750mb, with Chrome down there at a mere 250mb.

So clearly I need to do some space optimization.  It looks like about 40mb for just the program, but once I enable the REPL and it links in all its libraries that winds up at 116mb.  This probably means the sequencer has all libraries linked in twice, which is just a bit wasteful.  Perhaps I can get both the main compile and the GHC API to link shared libraries?

Also the score data is huge, and it grows over time.  For that I think I can do some work on interning strings, flattening pointers out of data structures, discarding old undo states (once I've finished the stuff to save them to disk), compacting cached signals and rendered MIDI down to flat bytes, and someday compacting long series of 'set' calls in control tracks down to their raw values, which should be a big time win if there are lots of them.

I sometimes get pretty laggy behaviour under OS X and I think some of it is due to excessive memory usage.  OS X definitely has poor behaviour when under load.  Work on this machine effectively comes to a stop when linking, to the point where the task switcher will get stuck and vim will get laggy.  And it's pretty normal to hit the pause key and wait a couple of seconds for itunes to actually wake up and pause.  So it's probably not totally my fault.

This laptop felt pretty fast when I first got it a number of years ago, but there's just no comparison against my relatively newer linux desktop.  One benefit of taking so long to finish the program is that "wait for computers to get faster" is a reasonable alternative to optimization.

Wednesday, April 18, 2012

text wrapping

And speaking of aesthetic matters, here are a few more screenshots I had on my desktop from when I was trying to figure out what to do with too-long text:

Rotating text vertically looks cool but turns out it's just not as readable as plain word wrapping.

It occurred to me that another solution would be to write the scores in Chinese.  And as a bonus, where else can you find a ready-made supply of 10,000 symbols that already have (more or less) definite meanings?

hex vs. decimal

One of the things I do a lot is writing numbers between 0 and 1.  All control values are normalized between 0 and 1 and it turns out scores have a lot of control values.  However, I haven't been entirely happy just writing plain decimal numbers for a couple of reasons.

Their values are not very visually apparent.  1.7 looks very similar to .17, but they are entirely different values.  I'm usually writing between 0 and 1 so it's annoying to have to write that decimal point every single time.  It shouldn't even be possible to accidentally write 1.7.  They're also variable length, so for example if I divide all the values by 2 some are likely to turn into big long numbers that no longer fit.  I already trim them to 3 decimal places, but a resolution of 1000 is still more than I need---MIDI only has 7 bits of resolution for most control values anyway.  It's also annoying how you have to type all those backspaces to change a value, and it's especially slow because it means you actually have to look at what you're doing to know how many backspaces to type.  That's the variable length thing.

So I thought I could take a page from the trackers and represent values as fixed width values between 00 and ff.  It's faster to type because each key can insert its higit on the right side, and bump off the higit on the left side.  I can't automatically advance the cursor after 2 higits because I don't keep track of a cursor position, but I know from experience it's pretty fast to keep one hand on the top digit row and another on the arrows.  a-f is a bit more problematic one-handed, especially since my one-handed dvorak is a bit awkward.

Of course not all values are normalized from 0 to 1, so I have an unusual syntax: all hex literals are two-higit and are divided by 255.  Actually, what I wound up with is `0x`## so the 'x' can be a small superscript, since it's noisy and space hogging to have every line with a leading 0x.

This is one of those places where a score is not much like a general purpose programming language.  Where else would you be writing so many numeric literals that the appearance, width, and ease of them was so important that it's worth building it into the syntax?

The result?  I don't know.  I think I've gotten used to decimal.  The hex actually winds up being a tiny bit wider, since decimal's leading decimal point is actually quite narrow.  I'm lukewarm at the moment but I'll have to work with hex for a while and see if I start to like it more.  Actually I seem to recall that back in the Amiga tracker day there were only 6 bits of resolution, and that 10, 20, 30, and 40 made convenient rough points of reference.  I suppose I can do the same with 40, 80, c0, and ff.

With decimal:

With hex:

Sunday, March 11, 2012

records and lenses

So I've been experimenting with lenses lately.  Haskell's records and especially record update are always a low-grade annoyance.

Lenses haven't been the unqualified success that I'd hoped they would be.

None of the lens libraries really name operators how I want, but they all have a small set of primitives, so I made a little wrapper module with my own names.  As a bonus it means I can switch between lens libraries easily.

I decided on fclabels to start with, simply because it has some Template Haskell to define the lenses automatically.  However, it's not quite right: they expect you to use their own hardcoded naming scheme, which is never going to work for an existing project where you can't convert the entire thing in one go.  So I sent a patch to give control over the naming scheme.

fclabels has a bunch of operators for operating on StateMonad, but it turns out I can't use them because in a couple places I have layered state monads.  I can't use the overloaded 'get' and 'put' because they wouldn't know which state to look in.  I fussed around for a bit trying to do some typeclass magic but eventually decided it was too complicated since I can lift the pure operators into the monad with the appropriate 'modify' function.  So I wound up with (using fclabels infix TypeOperators thing, which I don't like, but oh well):

-- | Reverse compose lenses.
(#) :: (a :-> b) -> (b :-> c) -> (a :-> c)
infixr 9 #

-- | Get: @bval = a#b $# record@
($#) :: (f :-> a) -> f -> a
infixr 1 $#

-- | Set: @a#b =# 42 record@
(=#) :: (f :-> a) -> a -> f -> f
infix 1 =#

-- | Modify: @a#b =% (+1) record@
(=%) :: (f :-> a) -> (a -> a) -> f -> f
infix 1 =%

-- | Use like @a#b #> State.get@.
(#>) :: (Functor f) => (a :-> b) -> f a -> f b
infixl 4 #> -- same as <$>

So that all makes nice looking code, but it turns out TH is a pain.

First, it would fail to compile even though it worked fine from ghci.  I use .hs.o suffixes for .o files, and eventually I figured out that without -osuf, TH can't find the modules it needs to link in at compile time.  With -osuf, ghc emits _stub.c files with weird names, they don't get linked in, and link errors ensue.  7.4.1 would fix that, but last time I tried that I spent a day trying to get it to stop recompiling everything (and that's a showstopper because then hint wants to reinterpret everything all the time).  So I added a special hack to omit -osuf for the FFI-wrapper-using modules that generate stubs.

TH also imposes limitations on the order of declarations.  All of the types in a record have to be declared before the TH splice, and you have to put uses of template-generated variables below the splice.  This is quite inconvenient for my layout convention, which is to group each type and its functions together.  To make TH happy, I would have to rearrange all my modules to put all the type declarations at the top, the TH lens splices in the middle, and all the functions at the bottom.  Not only is it awkward to rearrange everything, I don't like that layout, since it means you have to do a lot of jumping back and forth from the data declarations to the functions on them.

Then of course there are the name clashes, which have been the subject of much ballyhoo on the ghc mailing list lately.  My record naming convention is that a record 'Record' will have fields like 'record_field1', 'record_field2', etc., so the obvious thing to do is strip the 'record_' part off.  I had to tweak the derivers for three kinds of clashes.  Out of 330 modules, of which 74 are tests, containing 107 record declarations, there was one clash with a Haskell keyword ("default", that's the most awkward Haskell keyword, especially annoying because it's almost never used).  Then there were 4 clashes with "constructors".  Given a "Synth" I tend to create a function called "synth" to insulate callers from the record's implementation.  When I add a field or change a type I can change the lowercase "synth" constructor and all is well.  But if a record in the same module has that type as a field, say "inst_synth" then there will be a clash.  Then I had 2 clashes which really were two records in the same module that wanted to have the same field name.  One was "name" (not surprising, lots of things have names), and the other was "control_map".

And TH slows down compilation, and slows down interpretation significantly.  I recompile a lot, and re-interpret (with ghci) just about constantly, so it's a bit disconcerting to lose ground on this front.

Once I finally got this all working, I ran into another problem.  Compiling normally worked, but compiling with optimization and profiling crashed, in different ways on different platforms.

On OS X:

ghc: internal error: PAP object entered!
    (GHC version 7.0.3 for x86_64_apple_darwin)
    Please report this as a GHC bug:

On linux:

ghc: build/profile/obj/Util/Lens.hs.o: unknown symbol `CCCS'

So at this point, I gave up.  TH had already used up much more time that it would have saved, I wasn't looking forward to having to rearrange all my modules, so I just wrote out the lens declarations by hand.  With vim's '.' command it took a few minutes to crank out about 50.  I don't need lenses for many fields, only the ones that are likely to by modified in a nested way.

The result?  This:

set_keymap :: [(Score.Attributes, Midi.Key)] -> Patch -> Patch
set_keymap kmap patch = patch { patch_instrument = (patch_instrument patch)
    { inst_keymap = Map.fromList kmap } }

Looks nicer like this:

set_keymap :: [(Score.Attributes, Midi.Key)] -> Patch -> Patch
set_keymap kmap = instrument_#keymap =# Map.fromList kmap

One of the original motivations was this:

add_wdev :: String -> String -> Cmd.CmdL ()
add_wdev from to = modify_wdev_map $
    Map.insert (Midi.WriteDevice from) (Midi.WriteDevice to)

remove_wdev :: String -> Cmd.CmdL ()
remove_wdev from = modify_wdev_map $ Map.delete (Midi.WriteDevice from)

modify_wdev_map :: (Map.Map Midi.WriteDevice Midi.WriteDevice ->
        Map.Map Midi.WriteDevice Midi.WriteDevice)
    -> Cmd.CmdL ()
modify_wdev_map f = State.modify_config $ \config -> config
    { State.config_midi = modify (State.config_midi config) }
    modify midi = midi
        { Instrument.config_wdev_map = f (Instrument.config_wdev_map midi) }

I have replaced that with:

add_wdev :: String -> String -> Cmd.CmdL ()
add_wdev from to = State.modify $ State.config#State.midi#Instrument.wdev_map
    =% Map.insert (Midi.WriteDevice from) (Midi.WriteDevice to)

remove_wdev :: String -> Cmd.CmdL ()
remove_wdev from = State.modify $ State.config#State.midi#Instrument.wdev_map
    =% Map.delete (Midi.WriteDevice from)

So I save a few lines of code.  More importantly, I save having to think about whether to write a 'modify_*' and where to put it.

Was it worth it?  Estimating for time saved in the future against the time to set all this up, I'm very unlikely to break even.  I'm not motivated to change existing code because the gains are only significant when I have nested updates, and most of those are already encapsulated in a 'modify_xyz' function, and I like to keep those around anyway for the insulation they provide.

So my conclusions about haskell's records and lenses:

Yes, Haskell records are annoying.  But it's a small annoyance, spread across a lot of code.  Lenses seem promising, but so far they've been underwhelming because to solve a diffuse annoyance you need something automatic, unobtrusive, and effortless.  I believe lenses could do that but they would have to be built-in, so no TH awkwardness and a guarantee they'll be consistently defined for every record from the very beginning.

About name clashes: two records with the same name in the same module is rare for me.  In fact all clashes are rare, but more common are clashes with other variables.  For instance, the constructor convention above.  It's also common to name a variable based on a record field, e.g. 'alloc = config_alloc config'.

The lens libraries are just getting started.  They're still fiddling around with various flavors operators and theoretically interesting but (to me) not so essential gee-gaws without a lot of attention to practical matters (like built-in Map or List lenses, or control over lens name derivation).  But the set of core operations is just get, set, and modify, so I have no doubt that they will mature rapidly if people start using them a lot.  But my impression is that not many people use them yet.

I'm not even sure I'll merge the lens branch.  It adds a certain amount of complexity (especially since the old record syntax is not going to go away) and the benefits are not unambiguously compelling.  Since you still can't use them unqualified for field access they're not that much more concise than standard haskell.  So they pretty much win only for record update and especially for nested record update.  Probably the next time I have to write some annoying nested record update I'll get motivated to go merge in the branch.

On the subject of the record discussion on the ghc mailing lately, I still think my own idea ( is the only one that would be an improvement over the existing situation (for me, of course).  All the focus over sharing record field names is of little use to me because most of my clashes are rare, and most of them are not between record field names.

If we had SDNR then the most important result is that I'd have lenses everywhere without having to think about them.  And they'd be worth using even for field access.

remove_wdev :: String -> Cmd.CmdL ()
remove_wdev from = State.modify $
    #wdev_map.#midi.#config =% Map.delete (Midi.WriteDevice from)
    -- See, I can't decide whether composition should go right or left...

lookup_wdev :: Midi.WriteDevice -> Cmd.CmdL (Maybe Midi.WriteDevice)
lookup_wdev wdev = wdev . #wdev_map.#midi.#config wdev #> State.get

It would look even nicer if we could somehow get rid of the qualification and
for fun throw in []s for a map lens like in other languages:

lookup_wdev wdev = [wdev]wdev_map.midi.config #> State.get

But I'd settle for the # version since I can't see how the bottom one could happen without designing a new incompatible language.

Saturday, January 28, 2012

It works!

Here's the result of the giant previous post in 5 lines of a test:

    -- Trill transitions from 2 semitones to 1 semitone.
    equal (run
        [ ("*twelve", [(0, 0, "4a"), (4, 0, "i (4b)")])
        , ("t-diatonic", [(0, 0, "tr _ 1")])
        ([[(0, 69), (1, 71.25), (2, 70), (3, 71.75), (4, 71), (5, 72)]], [])

It's actually a little bit surprising when I try out the original motivation and what comes out is exactly what should.

Thursday, January 19, 2012

Pitch and transposition

I got stuck for a long time trying to implement a feature to support diatonic transposition (very relevant for e.g. trills) and enharmonics (not too relevant in 12TET but just scales can make use of them).  After getting halfway through an implementation which didn't work out, I decided I needed to step back a bit and analyze the problem, and only start implementation again once I knew what I was doing.  Unfortunately it wasn't all that successful, but I did wind up writing a lot:

A scale is just a mapping from a symbol to a certain frequency, but it gets complicated once transposition is involved.

Integral transposition is not very hard.  (4c) transposed chromatically becomes (4c#).  Once you introduce enharmonics it becomes more complicated because it could also be (4db), so you need to know the key.  In any scale other than 12TET these are likely different frequencies, so the distinction is important.

Now once you have a key of course diatonic transposition comes to mind.  It's important to support that because many ornaments (e.g. trill) are defined diatonically.  But if you have what you need to properly support chromatic transposition then you also have what you need for diatonic transposition.

This isn't so bad, but it means that transposition must be applied to symbolic pitches.  If a note is mapped to a frequency then it's too late to transpose it, so the obvious tactic of reducing a pitch to a frequency signal and then adding a transposition signal to it, is not going to work.  It doesn't even work if the frequency signal is instead measured in scale steps, because the information about the key has been lost.

Things become more complicated when I add the requirement than pitches are not integral.  For instance, a perfectly square trill will alternate between [+0, +1] diatonic steps, but many kinds of trills are rounded, and will thus be adding a fractional number of diatonic steps.  This means that the added interval should be scaled appropriately.  Adding .5 to (4a) should add a half-step, while adding .5 to (4b) would be a quarter-step.  Adding 1.5 should hence figure out the integral transposition, and then scale the remaining fraction apporpriately.

But then of course the underlying pitch itself may not be integral.  What does it mean to add a diatonic step to a note halfway between (4a) and (4b)?  Well, the interval it adds must be scaled between the whole step of (4a) and the half step of (4b), so it would be 0.75 of a step.  The example I like to use to visualize this is a trill on a note that is sliding from (4a) to (4b) in C major.  It should gradually change from a whole step alternation to a half step alternation.

Of course, since trills are not necessarily integral, now I have to combine both a non-integral pitch and a non-integral transposition.  It's a bit head-hurting at this point, but if adding 1 to a pitch halfway between (4a) and (4b) is 0.75 of a step, then adding half of that should be 0.375 of a step.  A half-diatonic trill from (4a) to (4b) should then change from a half step alternation to a quarter step.

Naturally, when I say "adding 0.375" it's not as simple as just adding two numbers.  The actual frequency change it represents depends, once again, on the note and scale.  The only place it doesn't is, by definition, any equal tempered scale.  This, I think, is the same problem as diatonic transposition, namely converting a scale of irregularly spaced intervals to a scale of regular ones.  So I think the same process can be applied twice: In the case of diatonic transposition, transposition is applied to notes of a certain mode and reduced to chromatic transposition in a certain scale.  Then the process is repeated to reduce chromatic transposition to NoteNumbers.

The question then is how to represent pitches so this can happen, so that each scale can have its own rules for how transposition works.  A simple scale is really just a mapping from symbols to frequencies, with no separate concept of diatonic and chromatic and no enharmonics.  A complex scale has all of those and can even be retuned over time so the pitches and intervals depend on the position in the score.  In addition, pitch is a continuously variable signal, not a scalar value.  Each pitch call produces a chunk of signal which may be a single sample or a section of a curve, and they have to be pieced together into a complete curve.  There is also a distinction between absolute and relative pitches.  A relative pitch signal is independent of scale, and has zero and negative values.


    Approach #1

The simplest option is a sequence of reals representing scale degrees for both absolute and relative pitches.  This breaks down pretty quickly though.  A note halfway between A (0) and B (2) would sound at A# (1).  But if you transpose that by one diatonic step you need to know the 1 is halfway between A and B and should therefore be halfway between B and C, so 2.5.  But if the 1 was just an A#, the result should be B#, sounding at 3.  The same problem exists for non equal tempered scales at the chromatic level.

This implies that I need to keep the "from" and "to" pitches around, so pitch can be a sequence of triples: (from, to, distance).  A note halfway between A and B is (A, B, .5).  Now if I want to transpose I can apply the transposition to the from and to notes, and end up with (B, C, .5).  Then reduce to a frequency by interpolating between the frequencies of B and C.  But this only works when the transposition is integral.  How would I transpose by half a step?  The transposition would be halfway between the transposition for A->B and B->C, but there's no way to represent that in a triple if 'from' and 'to' are scale degrees.  In the same way that collapsing (A, B, .5) into A# loses information important for transposition, collapsing ((A, B, .5), (B, C, .5), .5) into (A+.5, B+.5, .5) also loses information.

The key thing is that only scale degrees can be transposed, so I can't have something like A+.5.  So I should be able to apply the (from, to, distance) pattern recursively:

scale_to_frequency = [(A, 0), (B, 2), (C, 3), (D, 5)]

(A, B, .5) + 1d --> 1 + 1.5 = 2.5
    ((A, B, 1), (B, C, 1), .5) -> (2, 3, .5) -> 2.5
    ((A, B, .5), (B, C, .5), 1) -> (1, 2.5, 1) -> 2.5

(A, B, .5) + .75d --> 1.5 * .75 = 1.125, so 1 + 1.125 = 2.125
    ((A, B, .5), (B, C, .5), .75) -> (1, 2.5, .75) -> 2.125
    ((A, B, .75), (B, C, .75), .5) -> (1.5, 2.75, .5) -> 2.125

(A, C, .5) + .75d --> 1.5 + .75d --> 1.5 + .75 * 2 = 3
    ((A, B, .75), (C, D, .75), .5) -> ((0, 2, .75), (3, 5, .75), .5) -> 3
    ((A, C, .5), (B, D, .5), .75) -> ((0, 3, .5), (2, 5, .5), .75) -> 3

As reflected above, there are two ways to apply a transposition.  Either apply it to each of the branches, splitting each into a (from, to, distance) transposition, or apply it at the top, leaving the left branch the same and transposing the right branch, and expressing 'distance' as the distance between the two expressions.  Experimentally they seem to give the same answers, though I don't understand the math well enough yet to understand why this is the case.  Distributive law, I expect.

Of course, this is given a simple scale where each degree has a direct mapping to a frequency.  In practice a degree would have to be an abstract object that can be transposed chromatically and diatonically:

class Scale d where
    chromatic :: Int -> d -> d
    diatonic :: Int -> d -> d
    reduce :: d -> Frequency


data Degree d = I d | F (Degree d) (Degree d) Double
data Signal = forall d. (Scale d) => [(RealTime, Degree d)]

A more complicated scale, say just intonation tuned to a particular base frequency, would have degrees as follows:

data JustDegree { base :: Frequency, degree :: LetterPitch }
data LetterPitch = A | B | ... -- no accidentals

-- Or if accidentals are desired, with resolution to an arbitrary number
-- of flats or sharps:
data LetterPitch = Letter AtoG Accidental
data AtoG = A | B | ...
data Accidental = Neutral | Flat Int | Sharp Int

There is also the problem that the scale type variable will have to be reified into the data so the dynamic environment can remain monomorphic.  Presumably it could be sealed inside an existential type.

This appears to work, but it seems complicated.  A variable size structure like this doesn't lend itself to an efficient representation, which is desirable since I'll have arrays of these.  I am especially likely to have arrays where only the distance fraction is changing, and it seems like I should be able to avoid redundantly transposing all the note fields when they are mostly the same.

On the other hand, if I try to preserve sharing then each scale degree is just an extra pointer, and transposition is cheap if it's just consulting a table.  However, the recursive definition and type variable of 'Degree' ensure that those must be pointers, so I won't be able to get much space savings from unpacking.  It seems like I should be able to do some hackery like splitting the numeric values into separate vectors, but it's hard to think of how right now.

    Approach #2

Another approach is to represent pitch signals entirely as code, rather than data.

A pitch signal would just be 'type AbstractPitch = Deriver Signal.Pitch', i.e.  a function from the Dynamic state to a frequency signal.  Chromatic and diatonic transposition would be represented as signals in the dynamic environment.  This has some appeal because it means that "chromatic" and "diatonic" are simply conventions.  Scales can support or not support arbitrary forms of transposition at their choosing.  Of course, I suppose #1 can do this perfectly well too, just provide a 'transpose :: Transposition -> Int -> d' method.

This effectively delays the application of the transposition to the note in the same way as the (from, to, distance) representation does.  However, it's not nested in the same way, so all diatonic transpositions and chromatic transpositions would be collapsed.  So +1c +1d +1c would be collapsed to +2d +1c.  Are the results always the same?  I suppose whether or not they are depends on the definition of diatonic and chromatic for the particular scale, but I think for sensible scales they should be.

This is appealing because it's the same solution I use for the score in general.  Trying to create a data structure to represent the score in a way which can be later manipulated in the way I want is probably impossibly complicated given the range of manipulations.  In the same way, creating a pitch signal data type that can be later manipulated by transposition in the way I want winds up being complicated but still somewhat inflexible.  Delaying evaluation and presenting the manipulations as parameters to an abstract bit of code is similar idea to data-directed row extension as espoused by OO.

It also means that there is no concept of a "scale degree" and scales need not even have a concept of degrees, but simply names that will generate a given frequency in a given context that may include transposition.  This is apropos because I already have a "scale", Ratio, that is relative to another pitch and has no notion of steps or degrees or any of the usual structure that goes along with a scale.  In this world, pitches come from something a little more abstract and general than a scale.

However, this approach brings its own complications and difficulties.

Firstly, a pitch signal is actually generated by calls that themselves take the actual notes as arguments.  So a note call produces a single frequency, while a pitch call may create a constant segment, or interpolate from the previous note, or something more complicated.  Since only notes may be transposed, the pitch (a signal of frequencies in time) must also be delayed with the notes (a single frequency) delayed inside.  So there are two types: 'Note = Derive NoteNumber', and 'Pitch = Derive Signal.Pitch'.

As mentioned above, many pitch calls use the previous value, e.g. "interpolate linearly from previous value".  However, the previous value can't be a NN.  If the interpolate signal is changing, I need to reapply the interpolation to the source note on each sample.  Effectively, I consider the interpolate a distance from prev note and next note.  So the source note has to be a note, not an NN.  It also has to be a note representing the last sample emitted.  So if the interpolator is working at the NN level, say (b)-3nn, it has to construct that as a note and pass it to the next call.  I already have this same mechanism for control calls since they do exactly the same thing, but unfortunately I apparently can't reuse that for pitch calls because the previous value is threaded as part of the CallInfo by the track evaluator.  However, in this new system the call simply returns an unevaluated Pitch, and the actual sample values are not produced until later when all the CallInfos are long gone.  So instead of 'Pitch = Derive Signal.Pitch' I add an arg to thread the previous value explicitly: 'Pitch = (RealTime, Note) -> Derive (Signal.Pitch, (RealTime, Note))'.  Of course the Note actually has to be 'Maybe Note' since the first pitch call will not have a preceding note.  A Monoid instance for Pitch can thread them together.  But each pitch call still has to explicitly construct and return its "last note", which is a burden but can be hidden inside higher level utilities.

A further complication is that since pitch calls are now responsible for emitting the final pitch signal, they are responsible for taking the transpose signals into account.  This means that even a call that wants to emit a constant pitch must combine that with the interpolate signal in scope and may not be very constant at all!  On the flip side, it means pitch calls have control over whether transposition happens or not.  Unfortunately this doesn't mean much since pitch calls are not note calls, are not part of a scale, and thus likely not to have any interesting opinions on the matter.  Of course some note calls may very well ignore transpositions, but unless the pitch calls are going to be able to inspect the scale in effect for certain bits of hardcoded magic, the pitch call doesn't know that the note doesn't want to transpose and is thus obliged to call it at every transposition point even if the answer will always be the same.  Actually, there may well be some inspection of the scale since the pitch call still needs to know which signals the note considers transposition and should hence trigger a re-evaluation of the Note.  A scale that is not prepared to admit any transposition can simply give an empty set.

To eliminate this complexity I could think about trying to merge pitch and note calls and put everything into the scale, but the division is rather essential to factoring out scale independent functionality, like simple interpolation.

Efficiency is an additional concern.  Previously, a pitch signal would be evaluated once and various events underneath it could chop out the bits they are interested in.  However, if the pitch signal is now a deriver that is sensitive to the environment, every event that wants the signal will have to evaluate the whole thing again and there is no way for the deriver to know the event is only interested in a portion.  When the pitch track is below the event track (it's usually called the note track, but that's confusing in this context) it gets chopped up into one section per note, but if an event wants to know the pitch of a *following* event, it needs the pitch track already evaluated.  This is a particularly nettlesome aspect of score evaluation in general and I hope for a better solution, but if there isn't one then the redundant evaluation of a whole signal for every event may be too inefficient.  In that case, I may be able to hack on a cache to reuse the previous signal given that the transposition signals haven't changed since the last evaluation.

And of course this hints at a general problem when moving from data to code: I lose the ability to inspect the data.  I can evaluate a Pitch and look for a particular frequency, but if I want a "scale degree" out of it then there's no way.  In fact there's no real concept of a scale degree at all so I can't even ask simple questions like "what was the pitch class of the previous note" or, even harder due to the above evaluation order problem, "what is the pitch class of the next note".  But these are perfectly normal questions that I *will* need to ask.  Of course, given note calls in their full glory there may very well be no simple answer to that question.  But for scales which *do* have well defined dergees, which is after all likely to be most all of them, I will need a function 'NoteNumber -> Expr' where Expr is the textual representation of call that will generate that NoteNumber, or the nearest scale degree.  And then I could transpose a given Expr at the score level via the roundabout route of evaluating it to a NoteNumber in the proper transposing context and then converting that NoteNumber back to an Expr.  Such is the price of generality.

But such a price!  There is additional complexity in that tracklang Vals can now have Derivers inside them, courtesy of Note, which introduces a circular dependency between TrackLang and Derive.  Much of the similarity between control and pitch signals is lost; I can no longer deal with a pitch signal as easily as a control signal.  The simple concept of a signal of numbers or scale degree numbers is replaced by abstract pitches and notes which need to be evaluated in the correct context or there's a bug.  Also, a lot of code needs to be changed.

On the other hand, some code can be removed as well.  All of the PitchSignal data structure can be dropped, which is price already paid for trying (though failing) to support scales and transposition.

Another approach is to simply say transposition is not worth all the trouble and abandon it entirely.  But just about all pitch oriented ornaments are defined in terms of relative pitches, e.g. trill so that would be abandoning most of what I want to do.  Once I've accepted that I want a trill, wanting a diatonic trill is the next obvious step, since after all they mostly are.  Then, given that supporting continuously changing pitches is also central, I need to solve a diatonic trill on a changing base note.  Following that path seems to lead me inevitably to this conclusion.

Unfortunately, there's another feature missing: I need to be able to look at the pitch of other notes.  For example, a mordent needs to create new notes based on the initial pitch of a note.  But an abstract pitch signal doesn't give access to the individual notes inside.

Even more unfortunately I got most of the way through approach 2 before figuring this out.  And it was quite a bit of work because it changes quite a lot.

    Approach #3

Based on this, I modified approach 2 to be somewhat more conservative.  A PitchSignal remains a signal, that is, a vector of (RealTime, Pitch) pairs.  A pitch is a function Controls -> NoteNumber.  It can be transposed by composing a function that modifies the controls in front of it.

I get to reuse some of the old PitchSignal stuff (which treated a pitch as a (from, to, distance) triple) since this is a signal too.  But I keep the part from approach #2 where transposition is normal control signals with conventional names.  Though I could modify the pitch signals directly as before, if I want to support various different kinds of transpositions (so far I have chromatic, diatonic, and hz) I need something flexible that I can write directly into the score, and control signal names are just that.

It's a little frustrating to have spent so much time with ultimately rejected approaches before coming up with one that worked.  The whole idea behind writing down the requirements was to think of all the possibilities before spending time implementing something.  But even so I missed some requirements until it came time to actually convert existing code.  After thinking about the problem for such a long time I do have a kind of better handle on the problem and requirements, but of course that's only after the fact.  It would have been nice to have that perspective when it actually could have been useful.

It's interesting to observe that so much of the difficulty in interpreting the music score is controlling the order of evaluation.  With pitches, the problem is that the reduction of scale degrees to their frequencies must be delayed until all transpositions have been applied.  The purpose of the Deriver after all is to delay the evaluation of score elements until the full context, including pitch, is available, so they can make decisions based on that context.

This is somewhat related to the other key theme I've noticed, which is the tension between code and data.  Code is opaque but flexible, data is transparent but rigid.  To increase one is to diminish the other.  Since my main goal is flexibility in notation, I wind up favoring code over data.  The result is seemingly unavoidable complexity on the editing side, since this is where the lack of transparency hurts the most.