Legacy Code

CPN ToolsWindows 7I’m currently working on making CPN Tools, a tool for editing  and simulating coloured Petri nets (CP-nets or CPNs), work on Windows 7 64-bit versions.  Well, actually, I should be revising a paper and finishing my 3 hours and 45 minutes of a 4 hours and 30 minutes long tutorial for the CPN Workshop, but neither revising papers nor making slides is very fun, so there you go.

Given Microsoft’s record for keeping backwards compatibility, this should be a breeze, and indeed has for Windows NT4, Windows 2000, Windows XP, Windows Vista 32-bit, and Windows 7 32-bit.

Unfortunately, on 64-bit Windows (Vista and 7), things are not entirely as nice.  CPN Tools uses two not too common (and one common) languages for implementation, namely Beta and Standard ML of New Jersey (SML/NJ)  (with a teensy weensy bit of C mixed in).  Both are nice languages and much more influential than people think (Beta, for example, inspired inner classes and the implementation of fix-points in Java, and Microsoft’s F# language is basically SML for .Net).  The problem is that both languages are developed in academia and doesn’t see a lot of development lately.  This, in particular, means that neither have 64-bit compilers that are stable enough for production use on modern CPUs.  In fact, we use a nearly 10 years old SML compiler and a approximately 5 years old Beta compiler.  That’s a bit before Windows 7, mainstream 64-bit CPUs and more than 4 GiB RAM on standard desktop machines.  As Dawkins would have it, this means that CPN Tools crashes on 64-bit Windows 7 (and according to user reports, also on 64-bit Vista).

Using Microsoft’s new XP mode technology (basically a virtual machine with Windows XP), it is possible to make CPN Tools run on 64-bit Windows 7.  This comes at quite a cost, though, because of the virtualization overhead.  CPN Tools uses OpenGL for drawing it’s user interface, so decent performance of that is essential for a decent user experience.  Virtual PC does not give a decent OpenGL performance.  Furthermore, simulation of coloured Petri nets, the main task of CPN Tools, is fairly computationally expensive.  All in all, the virtualization route, while possible, is just not the best route to go.

I’m normally working on a project called ASCoVeCo (Advanced State space Methods and Computer Tools for Verification of Communication Protocols – yeah, I had to look that up).  It’s basically about doing state space analysis of coloured Petri nets (and other formalisms).  State space analysis, at it’s core is just fast systematic simulation, so we also use a CPN simulator in that project.  As I’m using a Mac and SML/NJ version 110.7, the version used in CPN Tools, is not available for Mac OS X, we use a newer version of SML/NJ in the ASCoVeCo project, namely 110.68.  Simulators created using this version of SML/NJ does indeed run on Windows 7 64-bit (albeit in 32-bit mode).

Building a New Version of CPN Tools

As the GUI of CPN Tools actually seems to bahave ok under 64-bit Windows 7, the next step is obvious: we just take the simulator from the ASCoVeCo project ad combine it with the GUI from CPN Tools, and, bam, we have a version that runs natively(ish) under 64-bit Windows 7.  In principle that is true.  Were it not for reality.

Problem the first: Nobody has compiled CPN Tools in 2-3 years.  In order to compile, we need a specially patched version of the Beta compiler and the linker from Microsoft Visual Studio 6.  Ok, I can get a hold of said version of VS.  I can also get a hold of the patched compiler and some nearly-true instructions for setting it all up.  The obvious solution is to install Windows XP in a virtual machine, setting it all up and freezing this machine to allow compilation as long as I can get the VM to run.  Which is probably for the foreseeable future.

Problem the second: Nobody has released a version of CPN Tools for 2-3 years.  CPN Tools uses a bunch of scripts I wrote 6 or 7 years ago.  They grew and grew, like scripts do, and became an unmaintainable mess with heaps of dependencies on computers that have been scrapped years ago, passwords that have been long forgotten and things like that.  After I stopped as a programmer on CPN Tools, other people took over the scripts (and did a much-needed cleanup, and only I was really able to make any sense of them in their old form).  Short story long, I didn’t dare trying getting the scripts up and running.

Also, the build process depends on a installer generator program, in a version that was new 6-7 years ago and which I cannot find anywhere anymore.  I tried that internet thing, and found a new installer generator and got it mostly set up to make distributables based on the installation of one of the old versions and a bit of looking thru ugly script code.

Problem the third: The new simulator is named vaguely differently and required slightly different settings in order to start up correctly.  In other words, I had to take a look at code untouched by human hands for 2-3 years in a language I had not coded in for 5-6 years.  Ok, I can do that.  I just make a search for the old names, replace it with the new ones, and, cocky as I were, fixed a couple other problems (bumped up the version number and replaced the old, crappy, vintage, never-understood scripts that make an off-line copy of the documentation normally hosted on a wiki-server).  I compiled, saw that it seemed to run, built a distributable, restarted my computer to get into Windows 7 (damn, does that ever have some nauseating effects that will make you barf up most of your intestines if hung over), and installed the new package.  Bam,

Problem the fourth: The crap didn’t run.  I even tested the file before making the build image.  All I got was a “Beta Execution Aborted, Reference is NONE” (basically a null-pointer error or segmentation fault).  Beta dumps a nice(ish) fil with a bit of information that in theory allows you to find the bug.

Before we start finding the bug, here is a brief syntax overview of Beta:

:@   - Static object reference declaration
:^   - Dynamic object reference declaration
:^|  - Dynamic component declration
:<   - Virtual declaration
&|   - Dynamic creation of component
@@   - I've forgot, but I think it's getting the physical address of
       an object (that may be garbage collected and moved, by the way)

Honestly, I kid you not; that’s the syntax.

See, now the real fun starts.  You have to find the error.  The dump file doesn’t contain line numbers; it contains fragment names and pattern names.  Fragments are basically files, so that is sort of helpful.  Patterns are an abstraction that encompasses both classes, fields and methods of modern object oriented languages.  Naturally, these can be nested, so the error I got was basically

  item <setdefaults#>setDefaults# in C:/cpn2000/cpntools/instruments/creationinstruments
    -- CreateAuxLabelInstrument#PageElementInstrument#GenericCursorScalingInstrument#CPNGenericInstrument#GenericInstrument#EditableInstrument#Instrument# in C:/cpn2000/cpntools/instruments/creationinstruments

Then multiply that with a stack depth of 30 and try finding the error.  Obviously it is in “detdefaults” in “CreateAuxLabelInstrument”.  No line number.  Also, Beta allows inheritance of patterns, also when they represent methods, so this method is distributed around 5-7 files and contains quite a few lines of code.

As I’m building on a virtual machine and the problem only manifests in an installed version, I need to compile the program, build an installation image, copy it to another machine, uninstall old versions, install the new version and test it.  All in all, not a cycle that is fun to repeat too much, meaning that the dump-file is more or less useless.

You then go to the next step, trying to establish a version that works and one that doesn’t.  My first test was to install on the build image (and screw that the day after, I needed that image for a course).  Luckily, I got the same error – I could then cut down the cycle to compile, build image, uninstall, install.  I went a step further: I knew the program ran in the development track but not in the installed location.  I tried starting it in the installed location form the command line; and got the same error.  Ok, so something about the installation location seemed wrong.  I could just compile, copy to the install location, and then run.  Much better.

Now comes the next phase of Beta debugging: try looking at the code very sternly in the hope that it will magically reveal why it fails.  Most likely it won’t.  In my case I discovered I had forgotten to copy an XML-file containing some default preferences into the installation location.  The code assumed that it existed and make no check to the contrary.  Also, it had probably failed previously, so somebody had inserted some error-checking code in completely inappropriate locations, causing it to fail in a completely different location.  Why does the code fail in CreateAuxLabelInstrument.setdefault, you may ask?  Well, the AuxLabelInstrument contains a default text that is used for the auxiliary label.  Said text is stored in the aforementioned XML file.  When the file is not loaded, the default value of a clean install is just unset (set to NONE).  This causes the code that calculated the width of the default label using the current font to fail.  And why do we calculate said width, you may even ask?  Well, to prepare an icon for the tool used to create auxiliary labels, of course.  We are not even close to using said tool, but we crash at this position for this very logical reason.  I found this bug by basically doing a diff between my new installation and the old one from 2-3 years ago.

Normally, you’d get another step that would just be acceptance of the bug, but in my case I was lucky.

Mjølner ToolWhy not just run the code in a debugger, you insist, surely Beta must have a debugger?  It does.  It is built into the Mjølner tool, which is a tool intended to crash (and send e-mails to n0-more existing e-mail addresses).  See, the Beta compiler is an old one.  It contains a fixed upper limit on the number of AST nodes you can have in a fragment.  This typically kicks in when the file is around 60-70 KiB in size.  That is, the compiler has one limit, and the Mjølner tool, with the debugger, has another.  This means that one of the files is in the magical interval where it compiles and doesn’t open in the debugger (not so much doesn’t open in the debugger, as the debugger crashes when you try opening the file).  You’d then ask why we did not just split up the file; can’t be too hard, now can it?  Well, yes.  See, the aforementioned fragment system constitutes Beta’s include mechanism.  Rather than being a dumb textual include mechanism (like in C) or a linking mechanism (like C, Java, etc.), it is actually an AST include with hiding.  Mostly, this is very nice and clever, but some times it sucks.  Such as if you have a huge object model that everybody wishes to access.  The entire object model basically has to be in a single file if the different parts should be able to know about each other.  As must all methods, as they will otherwise be hidden from other parts of the program.  All in all, this means that the entire object model must reside in a single file.  When it becomes complex enough, there is just not enough room for that.  So CPN Tools cannot be debugged.

Fun With the Simulator

Ok, after a day of fun doing the  3 last of the above (i spent a joyful day doing the first one some time ago), I got a version that even seemed to run during my extensive and thorough testing (really, I started it, created a model and ran 2 or 3 steps).  Today, during a live demo (which I didn’t really prepare because I’ve done it dozens of times), I discovered that the new simulator didn’t really completely work.

The reason is that the CPN simulator depends very heavily on the SML/NJ compiler.  We basically use the compiler to check which symbols are defined in code, which lets us build a dependency graph for our CPN models, which in turn allows us to incrementally syntax CPN models without having to write code that recognizes SML code ourselves (which is highly non-trivial as SML allows the definition of inline operators, making the language highly context-sensitive).  This is all very nice until you switch to a new compiler with a different structure.  First we had to patch the compiler to make it reveal it’s secrets to us (not too difficult, as SML/NJ is open source).  Then we need to change all the code gathering symbols from the old AST data structure to the new one.

Here it is important to know that the old code was written on a need-to-write basis.  Basically, the code was extended with new cases of the AST whenever we got an error.  There are still unhandled cases, we just haven’t encountered then yet.  The translation was done in much the same way: all the easy cases were translated.  The difficult ones were just commented out until they proved necessary for the tool to work.  As we in the ASCoVeCo project doesn’t really focus on incremental syntax check – we just want to load a previously created model – cases needed for incremental syntax check are naturally not well-tested.  And don’t work well.

So, now I have to try and fix this.  That is, I need to fix code running on a platform, I can only run in virtualized environments.  Oh, and the simulator takes around 5-10 minutes to compile, because it is even older code (from the 1980’s and 1990’s), which does not use SML’s concept of encapsulation (as that was developed after the first parts of the code), nor its dependency analysis for partial compilation (as that was written after the last code and depends on the encapsulation concept).

All in all this is not really an inviting prospect, so now I’m sort of happy, I have a paper to revise and a tutorial to finish.

2 thoughts on “Legacy Code

  1. Ugh… My first impression about CPN Tools was it is ‘legacy’. Now I understand why.

    But what I am actually thinking of – does the tool, which attract so many request for live demo, do not deserve to some kind of ground rewrite in any reasonable technology? I mean that C/C++/Ada or even Lisp – they all seems to be used within a very-long-live-cycle projects and are based on backward-compatible standards. Clear separation of GUI part is would be essential of course.

    1. A rewrite would be futile. CPN Tools is so popular because a LOT of effort has beenout into not only fixing most obvious bugs, but also to put it thru ten years of user testing. CPN Tools itself was a rewrite of the older Design/CPN and only after ten years caught up, and was still considered inferior by some. I believe that matching PN Tools as an editor would take ten or even fifteen years. CPN Tools, the editor, is being maintained in legacy state, and we’re about to release a completely new version as soon as formalities are in place. I, personally, am working on integrating the core of CPN Tools in many new places. You’ll be surprised by what is possible already with publicly available tools.

Leave a Reply to Piotr Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.