How Britney Makes Grading CPN Assignments More Bearable

I already mentioned my tool for automatic grading of CPN assignments (still in business for a name by the way…). Since my last explanation, the first alpha has gone live locally, and we have just used it to grade 2 assignments from 60-70 students. This is a screen shot from the grading process:

Compared with the last version I showed, this is a bit more polished and has more tests and a few new features.

Reporting

I have added a simple check that ensures that the environment has not been modified as well as a PDF export feature. This also includes a screen-shot of the model with the new parts highlighted, as such:

I have blurred the actual student solution, but the trick is that the environment is automatically dimmed and the student solution thus stands out. The report also executed a number of tests and shows the result to the student:

Some of the tests give additional diagnostics useful in grading and as help for the student. For example, the successful “3 Rejects lead to Return” test shows the possible final marking(s) of the test:

This makes it easy to see if the result is as expected. More interestingly, tests also yield comprehensive diagnostics when failing. For example, we get an error trace for the failed “Accept 10 Orders” test, we are given the marking (and enabled transitions in the marking) along with a formula that is supposed to but does not hold in that state:

This information is immensely useful for error finding, and – best of all – it is generated completely automatically. In this case, for some reason, no packet is ever received.

Configuration

The tests to run is now specified in a simple configuration file. Instead of going with a complex XML format, I just wrote a quick and dirty format with two sections and a number of configuration options for each. This is the configuration file used for a simple model:
[raw] [matcher] -5: signature,threshold=50

[tests] -5: declaration-preservation,globref
-100: interface-preservation,addpages=false,initmark=true,subset=system
-5: matchfilename
0.0666: btl,repeats=2,name=”Accept 10 Orders”,test=
+ (10 * (–> Order) -> (–> Order) => failure) &
+ ((–> Reject) => failure) &
+ (—> [|Received_Packages| = 10 & |Returns| = 0])
0.0666: btl,repeats=2,name=”Reject 10 Orders”,test=
+ (10 * (–> Order) -> (–> Order) => failure) &
+ ((–> Accept) => failure) &
+ [–> Finish_Order => false] &
+ —> [|Returns| = 10 & |Received_Packages| = 0] 0.0666: btl,repeats=10,name=”30 Random Orders”,test=
+ (30 * (–> Order) -> (–> Order) => failure) &
+ —> [|Received_Packages| + |Returns| = 30] [/raw] First, we see the matcher section, which describes how student ids are matched to models. We use the signature-based matcher which must match at least 50 nodes of the model. This value is given by the signer. If the signature does not match correctly, we award 5 minus points.

Next up the actual test section. Each test is described on one line, except any line ending with a backslash is merged with the following and any line starting with a plus is merged with the preceding. We first have two tests ensuring that declarations and net structure are preserved. Not doing so is severely punished. The next test ensures that the student id is contained in the filename. This test can also be used as a matcher if signed models are not used. The last 3 tests contain logical formulae that must hold. Each of these are awarded a small portion of positive points if satisfied.

Britney Temporal Logic

The logical formulae is described using BTL or Britney Temporal Logic. It is a very simple temporal logic designed for this tool. The logic is intended to be able to test that properties hold as well as to guide the model to certain scenarios.

As the logic is intended to prove the existence of errors using simulation, we need a logic where all counter-examples are existential. The negation of an existential formula is a global formula, so it is natural to choose LTL as a basis.

We want the logic to be able to guide the model. This requires that the logic does not have to guess. Intuitionistic logic is a particular kind of logic where a proof of a formula is constructed in a way so that it can be translated directly to a witness. For example, this means that a proof of a formula “A | B” requires that we specify which one of A and B is true as well as a proof of that. It also mean that the law of the excluded middle (“A | !A”, A or not A, also known as tertium non datur if you like latin) does not hold, as there is a third possibility: the formula is not false but neither is it provably true (this is a possibility thanks to Gödel’s incompleteness theorem). If we take cue from intuitionistic logic, we can make a logic which is very predictable. In essence this means that we disallow disjunction (or) and negation.

We want a syntax that is very similar to the regular way of describing transition sequences (so none of the F/G/U/X/◇/◻/○ symbol magic that tend to make regular LTL unreadable). We build everything on top of a simple step: A -> B, meaning first we execute A and then we execute B. We add two more types of eventualities, A –> B and A —>B, meaning that first we execute A and after a number of steps we execute B and first we execute A and when no more transitions are enabled, B must hold.

As we have no negative formulas (disjunctions or negations, including implications), we need another way of expressing conditionals. For this we introduce guarded versions of the three kinds of steps: A => B, A ==> B, and A ===> B meaning 1) if we execute A, then we must execute B next, 2) if we execute A, then we must execute B at some point, and 3) if we execute A, B must hold when no more transitions are enabled. For the purists, I can mention that A -> B is the same as A & (A => B), A –> B is the same as A & (A ==> B), and A —> B is the same as A & (A ===> B). Guarded expressions makes it possible to say that if something happens, something else must happen as well.

As we deal with repetitive scenarios, we also introduce a syntax for repetition. *A means execute A zero, one, or more times, and 7*A means execute A exactly 7 times. The finite version can be defined inductively as 1 * A = A and n * A = A -> (n-1) * A. The unbounded repetition is new and can be used to define ==> in terms of => as A ==> B is the same as A => (*true -> B).

We allow conjunctions to make it possible to give compositional specifications. As we disallow negative formulae, we have no way of specifying negative scenarios. To give at least a taste of this (to be able to specify that you execute exactly 10 transitions instead of just at least 10 transitions), we introduce the truth value failure. It has the same meaning as false, except we allow using it in guides. This means that “A => failure” makes it impossible to execute A now. If A is allowed by the model, it is not a mistake, it just never happens. Similarly, “(–> A) => failure” disallows A now and in the future. This is used in the above configuration in line 9, where we specify that Order must be executed 10 times but that subsequent executions of Order is a failure, i.e., that Order must be executed exactly 10 times during a simulation of this scenario. We use compositionally to additionally specify, in line 10, that Reject is never executed.

We also need to be able to specify non-guiding tests. We specify these separately as we allow more expressiveness here, as we do not have to use such formulas for guiding, but only for testing. Tests are specified inside square brackets. Tests can consist of any propositional combination of a number of atomic propositions. We have already seen the simplest atomic proposition, namely a transition name, indicating that a transition is executed. We also allow other atomic propositions for testing the number of tokens on a place, |P| and making all comparisons of such values. This can be combined using the standard & (and), | (or), and ! (not), which allows us to make all propositional combinations. In the example above in line 11, we see that we perform a test when the simulation has terminated that the place Received Packages contains 10 tokens and place Returns contain none. This means that if we order 10 packages (l. 9), never reject any (l. 10), at the end we must have received 10 packages and returned none (l. 11). The test in lines 13-16 is similar: make 10 orders, never accept any, this ensures that we never Finish an order (l. 15; note this is a test not a guide as it is enclosed in brackets) and that in the end all packets are returned. The final test, lines 18-19, makes 30 orders and in the end tests that the number of received packages plus the number of returns is equal to 30.

We allow mixing guides and tests, so it is possible to guide a simulation and then perform tests as well as guiding a simulation and using a test to branch for new tests. Using conjunctions at the top level, we can also specify guides and tests independently.

We allow a few more handy atomic propositions, so we can use time to make comparisons with the global clock of the system, and we can use the syntax Place := “<marking>” to set the marking of a place, which is useful for specifying scenarios without having to set them up using transitions.

Thanks to Dirk and Christian for feedback and suggestions during and after the definition of BTL.

Conclusion

The automatic grader (still needing a name, hinthint) implement BTL using a simple term-rewriting engine. The current implementation is fast enough that I could grade 60-70 assignments in less than 10 minutes without any interaction. The final grading just consists of looking at the reports and assigning a manual grade based on the tests. Actually writing the tests does take some effort, especially the first couple of times when the distinction between guiding and testing is not completely understood and the urgency of the (unconditional) steps, like ->, is not grasped, but is well worth it, as only a handful of exercises need to be manually opened in CPN Tools after using this tool.

Along with the automatic report generation, this has proven an immensely powerful tool which makes grading much simpler. In future versions, I’ll look into making the interface preservation more intelligent, so it is allowed to change a declaration from “var a: INT” to “var a, b: INT” and so it is not allowed to add arcs to places/transitions on interface pages that are not intended to be modified. I’ll also check that information flows correctly, i.e., that an interface place is not used for temporary storage. I’ll also add more sanity checks, such as token preservation, i.e., that no tokens are produced/consumed except when allowed, so no resources are consumed, produced or leaked. I’ll also add a new grader for doing performance analysis, including running simulation repetitions and automatically including the performance report in the final report.

I’ll also put some effort in making two new user interfaces for the tool: one for testing BTL formulae and one for students. The current workflow of testing formulae is absolutely annoying: start the grader, set up a grading process (at least 3 fields), load at least one model, run simulations, export report, and finally locate error, fix script and start from the beginning. The new tool will allow you to specify a base model, a location of models, a location of a script, and to repeatedly run the generation. The report will be viewable from within the tool. Maybe it will even be possible to edit the formula from within the tool. Models will not have to be reloaded. The student interface will allow students to run (restricted versions of) test-scripts and remove the most grave errors before handing in. Further in the future, I’ll look into making a web-service for performing tests, so students just upload their model and get immediate feedback.

While the tool is already very useful, I plan to actively extend it, and I’m very open to suggestions!