Saturday, December 29, 2012

Lag, drag, and memory corruption.


I'm definitely not loving laziness at the moment.  That's because I've spent the last two days trying to track down excessive amounts of DRAG in score derivation.  Not only are they excessive, but they grow more and more on each derivation.  That shouldn't happen because each derivation is independent and clears out the old one, but clearly something is being retained.

Part of the problem is that since I wasn't sure where the problem was, I hesitated to spend time creating an automatic reproduction, since I've done that sort of thing in the past only to find out the problem is related to the specific way that the GUI interaction handles the data.  This is a symptom of laziness space leaks being non-local.  And since debugging is basically trial and error, it took a long time to change something, recompile, run again, poke a bit, quit, and inspect the heap dump (though the ekg library helps).  Once I figured out what I had to force to get the DRAG to go away, I spent a lot of time on a kind of binary search either forcing bits of data or clearing it, to narrow down which bits did the trick.  It's complicated because frequently it's more than one thing, i.e. more than one bit of output is retaining the data.

The results aren't simple either.  For example making the cache strict would take about 5 MB off the growth... but I had to touch the UI for that to happen, so it's clearly being hidden behind another thunk.  And then 5 more MB would be cleared up when I ran another derive, but forcing everything would clear it all up immediately.

There are numerous clues, if only I knew how to interpret them, I took to writing them down because I can't remember them all at once:
  • Clearing both TrackDynamic and Cache gets rid of it, but only after one derive.  It's retained elsewhere, but I don't know where.
  • It's got something to do with perf vs. current perf, since clearing current perf doesn't do anything, but clearing perf does.  They should both point to the same data, so it should be both or nothing, right?
  • Forcing the cache gets rid of just one derive's worth of drag.
  • Drag is mostly Map and [].
  • If it grows by 15mb each time, force gets rid of 10mb.  5mb of permament growth is hiding somewhere.
  • Clearing the performance entirely gets rid of the last 5mb, but it still requires a derive or force to get it to happen.  Where is that reference hiding?
  • The first run has no DRAG, it only appears on the second run.  Clearly this is related to caching.
  • The VOID grows slowly, and it's mostly [].  Actually it looks like the [] is constant, and the growing part is misc other things like Double.
  • There's almost no LAG, except right at the beginning.  So that also points to how the cache is retained across derivations.
Of course, forcing is just to figure out who is dragging.  It's much better to not drag in the first place, by making the relevant data strict.  Just about all of my data structures are strict, but of course built-in ones like lists, Maybes, and Maps aren't.

I eventually figured out that the continually growing part was actually buried in PitchSignals, which are the only bit of the Score.Event which is code, not data.  That means, of course, rnf doesn't work on it.  It turns out they capture the environ in a closure.  While the environ itself is small, it's clearly not entirely evaluated, because rnf'ing the environ before creating the closure wiped out the DRAG.  The environ is basically a Map, but despite my attempts to make insertion strict, the only thing that's worked is rnf'ing the environ before each closure is created.  Maybe I can use ghc-heap-view to see which parts of the environ are unevaluated.  But anyway, that explains the the retains-forever thing, since the still-valid parts of the cache are copied to the new cache.

I've given up on the idea of lazy derivation, since it's not working out in practice, and I want to parallelize derivation.  But I feel like the work making derivation lazy wasn't totally a waste, because it was basically eliminating data dependencies, and that's a prerequisite for parallelization as well.

If my usual practice is to strictify every data structure I use, and I have all sorts of problems due to built-in data structures being lazy, wouldn't it be easier if everything was strict to begin with?  I still use laziness for Perform, and I think it's useful there, but it's just a stream.  In a default-strict language maybe I could just mark that one thing lazy and not have to deal with all this mess in the first place.

Currently I think I have three major leaks:
  • Too lazy environ captured in PitchSignal closures.  Workaround is rnf'ing the environ, but hopefully is fixable by making environ more strict.
  • Events in saved in the cache are too lazy.  The cache filters out log msgs about caching, since getting cached log msgs about caching from the previous cache is just really confusing.  The filter, being lazy, probably retains all the events and their referents.  The mysterious part is that the next run should use or toss the cache, which should clear the references either way, but doesn't.  In any case, I can work around by rnf'ing the cached events, but I don't know what the fix would be.  Making the entire derive output be a strict list?  Or maybe its the events themselves that need to be stricter?
  • Something else unknown in the performance.  I infer its presence because given about 15mb of growth after one derivation, forcing the cache clears about 5mb, forcing the environ clears about 5mb, but forcing the entire performance clears it all.
Of course, capturing too much in a closure can happen in strict languages too, but it's so much easier to do so when any data structure could actually be a thunk that retains who knows how much data from who knows how far away.


The other major problem is nastier, and I don't even have a good avenue of attack.  Ever since upgrading to ghc 7.6.1, I get internal errors from the GC, which probably means memory corruption.  It happens most frequently in the optimized build, and seemingly more frequently when it's using the libgit2 save format.  My first suspicioun is FFI code.

It also happens in the debug build.  It happens when I'm not using libgit2.  It happens when I'm using the stub MIDI driver.  The one time it doesn't happen is during tests.  I guessed it's in the GUI binding, but it also doesn't happen when I set up a script to drive the GUI.  Well, it's driving it via a socket, not with keyboard and mouse, so maybe it's in keyboard / mouse event stuff.  The fact that it only happens after an unpredictable amount of interactive use really makes it hard to take the trial-and-error approach that has been so slow on the memory leak problem.

As far as I know, this bug has been present for years, since long ago I had the same symptoms and "fixed" them by changing an alignment=1 Storable to alignment=4, which made no sense, since alignment=1 was correct.  But increasing the alignment could have just masked the overrun.

Valgrind shows nothing.  I have no way to reproduce it that's faster than 15 minutes of normal usage.  But I can't use a program that crashes every 15 minutes.

Saturday, December 1, 2012


Not related to karya, but I wrote all this stuff a long time ago when Prometheus first came out.  Bad Prometheus reviews have since become a kind of genre, so here's my entry into the genre.  Naturally this will spoil the whole movie if you haven't already seen it.

The expedition was ridiculously disorganized and haphazard considering they're walking into some ancient alien complex.

Biologist guys decide "we're going back all alone" because they're scared about weird bodies and stuff.  Yes, let's go back alone so we can get lost.  Even on a normal expedition on Earth I don't think you get two guys throwing a fit and deciding to wander back alone through a maze.  I guess they forgot about the fancy automappers, because they got lost anyway.  I also think you've kind of failed as a storyteller if you need to the characters to tell us what implausible thing happened as their first line: "oh, looks like we're still here because we got lost".  I guess they're fulfilling their Guys Who Get Killed First duty.

Meanwhile, our haphazard other heroes are bumping into things, knocking things over, shouting at each other, and not paying any attention to the android who can mysteriously operate alien machinery.

And someone drops her bag, and decides to dive back into the raging knife storm after it.  Even if it had something they haphazardly ripped off from the complex, and there's like a million cylinders and alien bodies and whatnot in there.  A dropped knapsack is totally worth a knife bath, right?

Then there's this thing where they're all mopey like "oh we didn't find anything, it's all empty, let's give up and go home" when they spent 15 minutes inside a tiny part of a giant complex and saw huge amounts of crazy stuff and alien writing and working machinery and had to run back early because the weather got bad.

There was a strange scene I forgot about with the head.  If you retrieve the head of an ancient alien corpse I guess the first thing you want to do is pump it full of adrenaline, which I guess makes alien heads relive their last moments.  So it will tell you... what it was running from maybe?  Who knows, because they accidentally put in the chemical that makes alien heads explode.  And then they're like shrug, so much for that alien head, and forgot about it entirely, which is what I did too.

And then the biologists (geologists? hard to tell since they didn't seem to do any of either) are stuck in a deeply creepy alien complex and the captain on the ship is just laughing at them.  "Yeah, there's something moving down there, along with all the horribly murdered alien corpses.  Sweet dreams!"  It's like they all knew they were in a movie and these guys were the Guys Who Get Killed First, so the captain is just winking at the audience.

But then the strangest thing is that these guys, after doing stupid things because they're scared, and then getting terrorized more by the captain, and running in the opposite direction from the "something moving", then when they actually see a scary alien they decide it's cute and they want to pet it.  Even after it hisses and makes all kinds of threatening gestures they just want to pick it up.  Aw, the alien snakebeast is hissing, I think it wants to be friends.  Kane wasn't that stupid.

I've never seen a C-section, but I don't think that's how it works.  Vickers' surgery table is mysteriously incompatible with women (I guess it's for the robot then?  But he can amputate his entire body and still be just fine.), but it does have a convenient little organ removal claw.  It also has a convenient button on the outside that floods the inside with poisonous gas.  Forget the critters, when you think about it the operating table is a horror movie all by itself.

Oh and before that she hits one of the doctors / orderlies (who I guess are trying to help her get exploded by an alien? when did they turn into psychopaths?) and both of them apparently decide not to chase her.  Or raise an alarm.  Or do anything else for the rest of the movie.  It's not so much that they forgot to chase her, it's that the scriptwriter forgot they existed.  Then she finishes up with her ordeal and it's like there's a tacit agreement to not mention the attempted murder-by-alien.  When Burke did that in Aliens it took a power failure and alien attack to distract them from killing him on the spot.

Anyway, they get to the Engineer and they're like "let me talk", "no let *me* talk", "me talk me talk me talk!" and hitting each other.  And the Engineer decides the first thing he should do after sleeping through his entire installation being wiped out, is to beat up the critters that woke him up (I understand, they *are* acting like 5 year-olds), and then go find their planet and presumably rain horrible black biological death on it.

Then the captain says "Hey guys, I think we'll have to crash into that ship and kill ourselves.  Because it's taking off, and we can't let alien ships take off.  And I have to be standing here while it happens."  Other guys: "Oh.  Ok.  Sounds like we should stand here too."

The crew in Alien got wiped out, but at least they were trying.