The Gtk2Hs code generator
Part 1: the original version
Originally Gtk2Hs was all written by hand. This was a pain as the Gtk+ API is really quite big and there’s a great deal of repetion. So I wrote a Haskell program to generate the api.
The C API
Of course it’s a good deal more complicated than that. I started with a bunch of scripts written by the Gtk# people. They have these perl scripts that scan the C header and implementation files to extract a highish level API for GObject-based C libs (like Gtk+ and many other GNOME modules). It produces an XML file that describes the name spaces, objects, signals, methods, parameters and types etc. It’s quite detailed. Crucially, it provides a description of the API at the GObject level, not the C level. This means we can translate into the way we bind the GObject features in Gtk2Hs.
Lots of XML
So I started by extracting info from the API XML file (using HaXml) and generated Haskell modules, one for each object/widget type. I tried fairly hard to match the existing hand written style. Of course it couldn’t be that close to the hand written modules because the order of declerations, extra imports, and many other things were different. So we needed more information. So the next step was to scan the existing .chs Haskell modules and glean them for information like the order of declerations and exports, the authors, creation date, copyrights and the file kind (.chs or .chs.pp). Then applying this info meant I could generate code much closer to the original hand written stuff.
Still not enough
Many types were not known because they are not defined in Gtk+, but in lower level libraries like Gdk and glib/gobject. That wasn’t too hard since we just generate the API XML files for those too and read them in and go through them to find the object types defined in them.
There are loads of deprecated or semi-internal functions that we do not want to bind. We already had a system for excluding functions we were not interested in, and a whole bunch of api.exclude files listing the ones we were not interested in. So I made the code generator read these in too and use them to remove the functions we’re not interested in.
And after lots of hacking…
With lots of tweaking, this was enough to generate reasonably sane code for most modules. There were places where the C API doesn’t provide enough information to write a Haskell binding so I had the code generator emit FIXMEs and those bits needed to be edited by hand. So Axel and I started merging changes from the generated code to the hand written version. This made our code much more regular rather than different modules having slightly different code styles. It also uncovered a large number of bugs where we’d made mistakes when doing it by hand, just because as humans we aren’t good at doing repetitive stuff without making mistakes (common examples included getting arguments in the wrong order).
Of course one of the main points of the code generator was to make it easier to bind new Gtk+ modules. Each major release of Gtk+ brings a slew of new APIs but with the code generator it became much quicker to bind the new stuff and to work on the backlog of stuff we’d never bound in the first place.
Documentation is pretty important, especially for a big API like Gtk+. We didn’t want to make everyone use the C API documentation because it referrs to lots of C concepts you’d keep having to translate C names to Haskell names in your head. It would also prevent us from deviating much from the C api if we had to use the C docs. On the other hand, since the API is large and well documented it would be a massive task to translate it all. In fact translating the C docs would probably have been more work than binding the C code. So the idea was to automagically translate the docs from the C API to our Haskell API. This was pretty tricky but turns out to work well.
Yet more XML
Again, I started with someone else’s scripts. The Gtk+ devs already have an inline doc markup format and a gtk-doc program for generating docbook XML from the markup and scanning the C api and implementation. Again, it documents it at the level of GObjects rather than generic C. The docbook XML that it produces isn’t very nice, though it is at least fairly regular and only uses a subset of features. However it was still necessary somewhat to reverse engineer the GObject level documentation from the generated docbook we by looking for specially named sections etc. I did this using a complicated XSLT script. Then the code generator reads the XML file produced by the XSLT script. It then tries to match up the documentation to the modules, methods, parameters, signals, properties etc.
C -> Haskell
Then it also has to try to ‘Haskellise’ the documentation. So references to C types and C functions have to be translated into the equivalent Haskell types and functions. Other constants like TRUE, FALSE etc have to be converted. NULL is an interesting case, because if the documentation talks about NULL then it suggests that one of the function’s pointer-type parameters is allowed to be NULL (or the result might be NULL). So I had NULL translated into a FIXME message indicating that the function should be looked at carefully. Then I added a list of functions where we knew a specific parameters/result might be NULL. Then I used this list to generate code that wrapped that paremter/result in the Maybe type and also to suppress the FIXME message in the documentation. So as we went through and addressed each of the FIXMEs we could just add the function-parameter pair to the list and have the code generator give us the right code and docs.
And in conclusion…
So in the end the docs work out pretty well. Some bits still need editing by hand because they refer to C things like who is responsible for freeing what but much of the time hadly any editing is necessary. So we ended up with really pretty good reference documentation. The docs are still huge of course because the API is so big. How we plan to deal with that is another story.
In theory the code generator should work for other GObject-based libraries, like other things in the GNOME stack. In practice when I used it for some other GNOME libs it need a little tweaking, but it’s still much easier than binding everything from scratch, if only because it provides a great template. So I guess the moral of the story is, if the code is to big to write yourself then write a program to write your program, and of course Haskell is a great language for writing code generators. But then you already knew both of those things.
So as you’ve gathered, the code generator is pretty complicated and full of tweaks to deal with various wierdnesses. It also grew over time to take in more and more sources of information. So it became pretty unweildy. So when I next find some time I’ll talk about restructuring it to be more maintainable so we can add yet more features!
You can see the current version of the code in the Gtk2Hs darcs repo here: