Still in the business for a name, but the work on my tool for automatic grading is coming along nicely. The idea is that we have a course twice a year with altogether more than 100 participants. They do an individual assignment where they build several versions of CPN models. All in all, this is a number of CPN models, you don’t really want to look at manually.
Furthermore, there has been problems with students copying exercises from each other, so we introduce some measures to detect that.
Basically the tool has a number of tests and each test can either give positive points (if it succeeds, there is a good chance the student has solved part of the assignment) or negative points (if it fails, there is a good chance the student is not adhering to things that are super-annoying for us and clearly listed in the exercise).
Students get a base model and has to add a missing part to the model. We give interfaces and an environment. To set up a test of student models, we use the Setup screen here:
We need to say where the base model is, where to find all hand-ins, and to give a list of the student ids for students supposed to hand in. Furthermore, we give a secret or password, which is only known by the instructors. This along with the student id is used to cryptographically fingerprint the models of the student so they cannot exchange models undetected.
The tool then starts the process of running tests. In the future, I want to describe the tests using a simple script (that may even be built using a GUI), but for not the script is described as Java code. Here we see the tool as it is running:
At all time to I have my results at the top (we get back to those) and a log with the current status at the bottom. I have tested one out of 5 models (at 20%) and can cancel the execution. Not much exciting here. More exciting is the result screen. It is actually the same screen, except the focus switches to the top part:
We now see results for all files and students. The first two columns are self-explanatory: which files was used which user was recognized as the owner. Note that user “333” is not recognized as the owner of any file and user “123” is recognized as the owner of “ass1_333.cpn” even though it presumably comes from user “333”.
The next column gives the score, which is a sum of the tests. As I haven’t implemented the tests yielding positive results yet, the highest score is 0.0. If we hover over a field, we see a break-up of which tests failed and why they failed. Here, we have deducted points from removing declarations from the original model. As the environment is fixed (and the exercise is very clear that students are not allowed to change it), this is an error. We can also deduct points if the file has the wrong name. It is so easy for the student and so annoying for us if it is wrong. Again, the exercise in very explicit about this. In this case we have not deducted anything for that (the name is ok as it includes the student id). Finally, we see that the fingerprint is ok. If it is wrong, a lot of points are deducted, as this indicates cheating. Here we also add positive test scores as soon as I implement them.
The last column indicates errors. This can be that the model cannot be loaded at all or, as here, that one student seemingly submits multiple models. Errors are typically fatal and means some manual action must be taken. They also contain diagnostics, and the tests are still run if at all possible. For example, the 4 models recognized as belonging to “123” are all checked. If it seems like a honest mistake that “333” uses the base model of “123”, he may get off with a stern warning. It is also possible to add extra checks for copying from one model to another.
Note: The points awarded in the screen-shots are just test values and not indicative of the grading of the final exercises.
Currently, I plan to add a test testing that the environment has not been modified (that only certain pages have been modified and even only some nodes on those pages), a test providing guided simulation and testing of invariants, and a test performing performance analysis and simulation repetitions. This will all be tied together using a simple configuration script, and be extensible so it is possible to add new kinds of tests very easily (I just implemented 2 kinds of tests today between 8pm and 8.30 pm).
In the future, I’ll make the tool available to all. I’ll also make a student version, which can help in getting quick help. Running the all the non-cheat-detection tests may be useful for students to get diagnostics. The final grading would of course be done using a more comprehensive test-suite (i.e., students get one for finding honest errors, we have a tougher one for finding models that only adhere to the test-suite and not solve the more general case). The student version will probably be implemented as a web-service. I’m also planning on looking into making a script-builder for easily making test-scripts.
Time person of the year 2006, Nobel Peace Prize winner 2012.