Saturday, December 29, 2012

Lag, drag, and memory corruption.


I'm definitely not loving laziness at the moment.  That's because I've spent the last two days trying to track down excessive amounts of DRAG in score derivation.  Not only are they excessive, but they grow more and more on each derivation.  That shouldn't happen because each derivation is independent and clears out the old one, but clearly something is being retained.

Part of the problem is that since I wasn't sure where the problem was, I hesitated to spend time creating an automatic reproduction, since I've done that sort of thing in the past only to find out the problem is related to the specific way that the GUI interaction handles the data.  This is a symptom of laziness space leaks being non-local.  And since debugging is basically trial and error, it took a long time to change something, recompile, run again, poke a bit, quit, and inspect the heap dump (though the ekg library helps).  Once I figured out what I had to force to get the DRAG to go away, I spent a lot of time on a kind of binary search either forcing bits of data or clearing it, to narrow down which bits did the trick.  It's complicated because frequently it's more than one thing, i.e. more than one bit of output is retaining the data.

The results aren't simple either.  For example making the cache strict would take about 5 MB off the growth... but I had to touch the UI for that to happen, so it's clearly being hidden behind another thunk.  And then 5 more MB would be cleared up when I ran another derive, but forcing everything would clear it all up immediately.

There are numerous clues, if only I knew how to interpret them, I took to writing them down because I can't remember them all at once:
  • Clearing both TrackDynamic and Cache gets rid of it, but only after one derive.  It's retained elsewhere, but I don't know where.
  • It's got something to do with perf vs. current perf, since clearing current perf doesn't do anything, but clearing perf does.  They should both point to the same data, so it should be both or nothing, right?
  • Forcing the cache gets rid of just one derive's worth of drag.
  • Drag is mostly Map and [].
  • If it grows by 15mb each time, force gets rid of 10mb.  5mb of permament growth is hiding somewhere.
  • Clearing the performance entirely gets rid of the last 5mb, but it still requires a derive or force to get it to happen.  Where is that reference hiding?
  • The first run has no DRAG, it only appears on the second run.  Clearly this is related to caching.
  • The VOID grows slowly, and it's mostly [].  Actually it looks like the [] is constant, and the growing part is misc other things like Double.
  • There's almost no LAG, except right at the beginning.  So that also points to how the cache is retained across derivations.
Of course, forcing is just to figure out who is dragging.  It's much better to not drag in the first place, by making the relevant data strict.  Just about all of my data structures are strict, but of course built-in ones like lists, Maybes, and Maps aren't.

I eventually figured out that the continually growing part was actually buried in PitchSignals, which are the only bit of the Score.Event which is code, not data.  That means, of course, rnf doesn't work on it.  It turns out they capture the environ in a closure.  While the environ itself is small, it's clearly not entirely evaluated, because rnf'ing the environ before creating the closure wiped out the DRAG.  The environ is basically a Map, but despite my attempts to make insertion strict, the only thing that's worked is rnf'ing the environ before each closure is created.  Maybe I can use ghc-heap-view to see which parts of the environ are unevaluated.  But anyway, that explains the the retains-forever thing, since the still-valid parts of the cache are copied to the new cache.

I've given up on the idea of lazy derivation, since it's not working out in practice, and I want to parallelize derivation.  But I feel like the work making derivation lazy wasn't totally a waste, because it was basically eliminating data dependencies, and that's a prerequisite for parallelization as well.

If my usual practice is to strictify every data structure I use, and I have all sorts of problems due to built-in data structures being lazy, wouldn't it be easier if everything was strict to begin with?  I still use laziness for Perform, and I think it's useful there, but it's just a stream.  In a default-strict language maybe I could just mark that one thing lazy and not have to deal with all this mess in the first place.

Currently I think I have three major leaks:
  • Too lazy environ captured in PitchSignal closures.  Workaround is rnf'ing the environ, but hopefully is fixable by making environ more strict.
  • Events in saved in the cache are too lazy.  The cache filters out log msgs about caching, since getting cached log msgs about caching from the previous cache is just really confusing.  The filter, being lazy, probably retains all the events and their referents.  The mysterious part is that the next run should use or toss the cache, which should clear the references either way, but doesn't.  In any case, I can work around by rnf'ing the cached events, but I don't know what the fix would be.  Making the entire derive output be a strict list?  Or maybe its the events themselves that need to be stricter?
  • Something else unknown in the performance.  I infer its presence because given about 15mb of growth after one derivation, forcing the cache clears about 5mb, forcing the environ clears about 5mb, but forcing the entire performance clears it all.
Of course, capturing too much in a closure can happen in strict languages too, but it's so much easier to do so when any data structure could actually be a thunk that retains who knows how much data from who knows how far away.


The other major problem is nastier, and I don't even have a good avenue of attack.  Ever since upgrading to ghc 7.6.1, I get internal errors from the GC, which probably means memory corruption.  It happens most frequently in the optimized build, and seemingly more frequently when it's using the libgit2 save format.  My first suspicioun is FFI code.

It also happens in the debug build.  It happens when I'm not using libgit2.  It happens when I'm using the stub MIDI driver.  The one time it doesn't happen is during tests.  I guessed it's in the GUI binding, but it also doesn't happen when I set up a script to drive the GUI.  Well, it's driving it via a socket, not with keyboard and mouse, so maybe it's in keyboard / mouse event stuff.  The fact that it only happens after an unpredictable amount of interactive use really makes it hard to take the trial-and-error approach that has been so slow on the memory leak problem.

As far as I know, this bug has been present for years, since long ago I had the same symptoms and "fixed" them by changing an alignment=1 Storable to alignment=4, which made no sense, since alignment=1 was correct.  But increasing the alignment could have just masked the overrun.

Valgrind shows nothing.  I have no way to reproduce it that's faster than 15 minutes of normal usage.  But I can't use a program that crashes every 15 minutes.

No comments:

Post a Comment