Sunday, March 11, 2012

records and lenses

So I've been experimenting with lenses lately.  Haskell's records and especially record update are always a low-grade annoyance.

Lenses haven't been the unqualified success that I'd hoped they would be.

None of the lens libraries really name operators how I want, but they all have a small set of primitives, so I made a little wrapper module with my own names.  As a bonus it means I can switch between lens libraries easily.

I decided on fclabels to start with, simply because it has some Template Haskell to define the lenses automatically.  However, it's not quite right: they expect you to use their own hardcoded naming scheme, which is never going to work for an existing project where you can't convert the entire thing in one go.  So I sent a patch to give control over the naming scheme.

fclabels has a bunch of operators for operating on StateMonad, but it turns out I can't use them because in a couple places I have layered state monads.  I can't use the overloaded 'get' and 'put' because they wouldn't know which state to look in.  I fussed around for a bit trying to do some typeclass magic but eventually decided it was too complicated since I can lift the pure operators into the monad with the appropriate 'modify' function.  So I wound up with (using fclabels infix TypeOperators thing, which I don't like, but oh well):

-- | Reverse compose lenses.
(#) :: (a :-> b) -> (b :-> c) -> (a :-> c)
infixr 9 #


-- | Get: @bval = a#b $# record@
($#) :: (f :-> a) -> f -> a
infixr 1 $#


-- | Set: @a#b =# 42 record@
(=#) :: (f :-> a) -> a -> f -> f
infix 1 =#


-- | Modify: @a#b =% (+1) record@
(=%) :: (f :-> a) -> (a -> a) -> f -> f
infix 1 =%


-- | Use like @a#b #> State.get@.
(#>) :: (Functor f) => (a :-> b) -> f a -> f b
infixl 4 #> -- same as <$>

So that all makes nice looking code, but it turns out TH is a pain.

First, it would fail to compile even though it worked fine from ghci.  I use .hs.o suffixes for .o files, and eventually I figured out that without -osuf, TH can't find the modules it needs to link in at compile time.  With -osuf, ghc emits _stub.c files with weird names, they don't get linked in, and link errors ensue.  7.4.1 would fix that, but last time I tried that I spent a day trying to get it to stop recompiling everything (and that's a showstopper because then hint wants to reinterpret everything all the time).  So I added a special hack to omit -osuf for the FFI-wrapper-using modules that generate stubs.

TH also imposes limitations on the order of declarations.  All of the types in a record have to be declared before the TH splice, and you have to put uses of template-generated variables below the splice.  This is quite inconvenient for my layout convention, which is to group each type and its functions together.  To make TH happy, I would have to rearrange all my modules to put all the type declarations at the top, the TH lens splices in the middle, and all the functions at the bottom.  Not only is it awkward to rearrange everything, I don't like that layout, since it means you have to do a lot of jumping back and forth from the data declarations to the functions on them.

Then of course there are the name clashes, which have been the subject of much ballyhoo on the ghc mailing list lately.  My record naming convention is that a record 'Record' will have fields like 'record_field1', 'record_field2', etc., so the obvious thing to do is strip the 'record_' part off.  I had to tweak the derivers for three kinds of clashes.  Out of 330 modules, of which 74 are tests, containing 107 record declarations, there was one clash with a Haskell keyword ("default", that's the most awkward Haskell keyword, especially annoying because it's almost never used).  Then there were 4 clashes with "constructors".  Given a "Synth" I tend to create a function called "synth" to insulate callers from the record's implementation.  When I add a field or change a type I can change the lowercase "synth" constructor and all is well.  But if a record in the same module has that type as a field, say "inst_synth" then there will be a clash.  Then I had 2 clashes which really were two records in the same module that wanted to have the same field name.  One was "name" (not surprising, lots of things have names), and the other was "control_map".

And TH slows down compilation, and slows down interpretation significantly.  I recompile a lot, and re-interpret (with ghci) just about constantly, so it's a bit disconcerting to lose ground on this front.

Once I finally got this all working, I ran into another problem.  Compiling normally worked, but compiling with optimization and profiling crashed, in different ways on different platforms.

On OS X:

ghc: internal error: PAP object entered!
    (GHC version 7.0.3 for x86_64_apple_darwin)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

On linux:

ghc: build/profile/obj/Util/Lens.hs.o: unknown symbol `CCCS'

So at this point, I gave up.  TH had already used up much more time that it would have saved, I wasn't looking forward to having to rearrange all my modules, so I just wrote out the lens declarations by hand.  With vim's '.' command it took a few minutes to crank out about 50.  I don't need lenses for many fields, only the ones that are likely to by modified in a nested way.

The result?  This:

set_keymap :: [(Score.Attributes, Midi.Key)] -> Patch -> Patch
set_keymap kmap patch = patch { patch_instrument = (patch_instrument patch)
    { inst_keymap = Map.fromList kmap } }

Looks nicer like this:

set_keymap :: [(Score.Attributes, Midi.Key)] -> Patch -> Patch
set_keymap kmap = instrument_#keymap =# Map.fromList kmap

One of the original motivations was this:

add_wdev :: String -> String -> Cmd.CmdL ()
add_wdev from to = modify_wdev_map $
    Map.insert (Midi.WriteDevice from) (Midi.WriteDevice to)


remove_wdev :: String -> Cmd.CmdL ()
remove_wdev from = modify_wdev_map $ Map.delete (Midi.WriteDevice from)


modify_wdev_map :: (Map.Map Midi.WriteDevice Midi.WriteDevice ->
        Map.Map Midi.WriteDevice Midi.WriteDevice)
    -> Cmd.CmdL ()
modify_wdev_map f = State.modify_config $ \config -> config
    { State.config_midi = modify (State.config_midi config) }
    where
    modify midi = midi
        { Instrument.config_wdev_map = f (Instrument.config_wdev_map midi) }

I have replaced that with:

add_wdev :: String -> String -> Cmd.CmdL ()
add_wdev from to = State.modify $ State.config#State.midi#Instrument.wdev_map
    =% Map.insert (Midi.WriteDevice from) (Midi.WriteDevice to)


remove_wdev :: String -> Cmd.CmdL ()
remove_wdev from = State.modify $ State.config#State.midi#Instrument.wdev_map
    =% Map.delete (Midi.WriteDevice from)


So I save a few lines of code.  More importantly, I save having to think about whether to write a 'modify_*' and where to put it.

Was it worth it?  Estimating for time saved in the future against the time to set all this up, I'm very unlikely to break even.  I'm not motivated to change existing code because the gains are only significant when I have nested updates, and most of those are already encapsulated in a 'modify_xyz' function, and I like to keep those around anyway for the insulation they provide.

So my conclusions about haskell's records and lenses:

Yes, Haskell records are annoying.  But it's a small annoyance, spread across a lot of code.  Lenses seem promising, but so far they've been underwhelming because to solve a diffuse annoyance you need something automatic, unobtrusive, and effortless.  I believe lenses could do that but they would have to be built-in, so no TH awkwardness and a guarantee they'll be consistently defined for every record from the very beginning.

About name clashes: two records with the same name in the same module is rare for me.  In fact all clashes are rare, but more common are clashes with other variables.  For instance, the constructor convention above.  It's also common to name a variable based on a record field, e.g. 'alloc = config_alloc config'.

The lens libraries are just getting started.  They're still fiddling around with various flavors operators and theoretically interesting but (to me) not so essential gee-gaws without a lot of attention to practical matters (like built-in Map or List lenses, or control over lens name derivation).  But the set of core operations is just get, set, and modify, so I have no doubt that they will mature rapidly if people start using them a lot.  But my impression is that not many people use them yet.

I'm not even sure I'll merge the lens branch.  It adds a certain amount of complexity (especially since the old record syntax is not going to go away) and the benefits are not unambiguously compelling.  Since you still can't use them unqualified for field access they're not that much more concise than standard haskell.  So they pretty much win only for record update and especially for nested record update.  Probably the next time I have to write some annoying nested record update I'll get motivated to go merge in the branch.

On the subject of the record discussion on the ghc mailing lately, I still think my own idea (http://hackage.haskell.org/trac/ghc/wiki/Records/SyntaxDirectedNameResolution) is the only one that would be an improvement over the existing situation (for me, of course).  All the focus over sharing record field names is of little use to me because most of my clashes are rare, and most of them are not between record field names.

If we had SDNR then the most important result is that I'd have lenses everywhere without having to think about them.  And they'd be worth using even for field access.

remove_wdev :: String -> Cmd.CmdL ()
remove_wdev from = State.modify $
    #wdev_map.#midi.#config =% Map.delete (Midi.WriteDevice from)
    -- See, I can't decide whether composition should go right or left...


lookup_wdev :: Midi.WriteDevice -> Cmd.CmdL (Maybe Midi.WriteDevice)
lookup_wdev wdev = Lens.map wdev . #wdev_map.#midi.#config wdev #> State.get

It would look even nicer if we could somehow get rid of the qualification and
for fun throw in []s for a map lens like in other languages:

lookup_wdev wdev = [wdev]wdev_map.midi.config #> State.get

But I'd settle for the # version since I can't see how the bottom one could happen without designing a new incompatible language.