Improved Paradigm for Analysis, Verification and Data Retention of CPNs using CPNaaS

This post has 2896 words. Reading it will take approximately 14 minutes.

CPN Tools Logo Cloud AnalysisCPN Tools Logo Cloud VerificationDesign/CPN was the first tool supporting computer-based editing and simulation of colored Petri nets. It was later extended with a for the time powerful state-space analysis tool. CPN Tools improved on Design/CPN by merging the editing and simulation phases, vastly speeding up the modeling phase because simple errors could be found and fixed very quickly.  CPN Tools also added simulation-based analysis.

The analysis phase was largely detached from the model-construction phase, however.  That meant that you would construct a model, simulate it until you believed it was free of errors and then do analysis.  This had 3 major disadvantages:

  1. It was hard or impossible to determine which version of a model was used for testing,
  2. it was cumbersome to perform decent regression testing, and
  3. it was hard to reproduce results and extend analysis into the past.

CPN Tools provided no means for versioning models, so this had to be done manually, either by renaming models, by using a code-versioning tool like CVS, Subversion or git, or by using a versioned file-synchronization service like Dropbox.  While this would mostly work most of the time, it was either a cumbersome manual operation (ensuring it would happen rarely) or an uncontrollable and unpredictable process (ensuring that it would happen a lot when not needed, but never between two crucial edits).  Even if a suitable version was created, there would be no obvious between a simulation or state-space report and the corresponding model version.

The state-space tool could generate a standard report, which would contain the current time and date, and the simulation based performance analysis could generate simulation reports containing the same information, but going from this information to the correct version of the model would be cumbersome at best.

For user-defined properties, the tool didn’t provide any means at all – such properties were analyzed manually, and results tracked or discarded manually.  It would often be very hard to track the original model producing a particular result.

As properties were tested manually, that also means that all properties were not necessarily tested in each iteration.  This makes it very hard to pinpoint exactly when a regression happens and fix it.  It also means that modelers are very unlikely to add properties with simple sanity checks or go all the way and do test-driven modeling (the modeling equivalent of test-driven programming) or just documenting each flaw found at random in the model using a test.

If a new test was added or a new performance metric was added, it was very unlikely that this would also be tested on earlier iterations of a model, even if applicable.  The same is the case if an error is found in the property checked.

CPN Tools Logo Cloud SnapshotsCPNaaS instead handles all of this automatically and intelligently.  Any change to a model is tracable and reversible.  Furthermore, the Snapshots API makes it easy to refer to any previous version of a model.  Not only is it easy to refer to earlier versions, it is a basic design principle of the API that any tool working with live models should be able to work with snapshots.  That a.o. means tools for displaying models, tools for doing performance analysis, and tools for doing state-space analysis.

Another problem of state-space analysis in CPN Tools was that is was based around the computation of a state-space.  This means that it was very hard for the tool to do optimizations behind the back of the user, such as using a reduction technique or performing checking in the background.

ASAP was a tool for doing state-space verification.  It was separate from CPN Tools to provide a more flexible user interface and because it was not tied to CPN models.  It introduced the notion of a “verification project,” which would tie models, properties and methods together.  This makes properties independent of the method and model.

CPN Tools Logo Cloud VerificationCPNaaS’ Verification API takes this idea even further and fundamentally integrates it with the Snapshots feature.  Instead of verification projects, we use “test suites,” which are tied to a single model.  In addition, a test-suite does not specify a method.  This allows the tool to pick the most efficient method based on experience or heuristics.  For example, a test suite with just safety properties can be checked very fast using an on-the-fly technique, saving memory and time.  Furthermore, properties are tracked outside of test suites.  That means that properties can be used in multiple test suites, and versioned independently of models and test suites.

The basic API for test suites in CPNaaS looks as follows:

Screenshot 2015-05-24 17.07.37

The API of course adheres to the basic design principle of CPNaaS with the 5 calls for each resource type.  Test suites consist of runs of the test suite.  A test suite is ties to a particular model and can consist of any number of properties.  A run of a test suite is tied to a snapshot of the model.  This means it is possible to run a test suite on an older model, and also that the exact model used for a run is part of the properties.  As run cannot be modified once performed.

Properties are one of three kinds (currently – subject to refinement and change):

Screenshot 2015-05-24 17.17.21

The type of property just indicates which kind of information the property can assume to have available and what assumptions the tool does.  Safety properties just has the current state and enabled transitions available and allow early termination by default, making them suitable for on-the-fly verification.  That also means that the tool can decide to test the properties during regular simulation.  Liveness properties have more history available (details to be decided), and statistics properties perform computations on what is being computed anyway.

A strength of Design/CPN and CPN Tools is that they come with a standard report, which computes a number of model-independent statistics.  Unfortunately, it is very hard to extend this report with customer properties because it is “special.”  Furthermore, most of the properties used in the standard report are special too, making it hard to use them as a starting point for similar but different properties.

CPNaaS tries to retain the immediacy and simplicity of the standard report while avoiding its deficiencies.  It does that by taking the stance that the standard report is just a particular (note particular, not special) test suite, and the properties contained therein are just particular properties.  It therefore provides special constructor calls for the standard report test suite and the properties stored therein:

Screenshot 2015-05-24 17.26.44

As soon as a standard report test suite is created, it is just like any other test suite.  It can be found, modified and deleted like any other test suite.  This makes it possible to remove parts you don’t care about and add new parts and create your own non-standard standard report.  The properties contained in the standard report are also provided as special constructors.  We have even added certain specialized properties that are very common but model-specific, such as a safety property checking if a particular place is unmarked or if a particular transition is enabled.  This makes it very easy for less experienced modelers to check non-standard properties as they can be formulated simply as enabling of a transition.

The API does no reporting; it simply stores the result of the verification and how it arrived at the answer.  Results can be extracted to follow how iterations fix modeling problems.  The API is free to pick any reduction method that provides the guarantees requested by the user.  This means it can pick a suitable reduction technique based on what performed well on earlier iterations of the model or heuristics, and when re-running an updated test-suite it does not need to re-check properties already verified earlier.  Of course, user hints can affect the decision (ranging from hinting that bit-state hashing might be a good idea, over providing a progress measure for the sweep-line method, al the way to insisting on rechecking all properties using a given method).

CPN Tools monitors can be switched on and off, and it is possible to use the simulation replications mechanism to generate a number of runs with the same settings.  CPN Tools can in principle do repeatable experiments, but this depends on a rather large set of settings that are not stored in the simulation report.  Furthermore, monitors are versioned with models so it is not easy to go back and re-do an experiment with more monitors or to add more runs later.

CPN Tools Logo Cloud AnalysisCPNaaS Analysis API takes the concepts of the CPN Tools monitoring facility one step further, and uses a concept very similar to test suites for analysis, an experiment.  List a test suite, an experiment consists of multiple runs.  An experiment uses a number of monitors and optionally breakpoints (in CPN Tools, a breakpoint is called a breakpoint monitor).  The API follows the exact same lines as the Verification API:

Screenshot 2015-05-24 22.52.39

Like for properties, we have an API for accessing breakpoints and monitors in general using the familiar 5 galls each, and extra constructor POST calls for creating specialized versions:

Screenshot 2015-05-24 22.55.02

As for properties, the specific constructors tie to the net element they belong to: place, transition or other.

Along with the experiment we store the snapshot version of the model and the random number seed used to generate the simulation, ensuring that the experiment is repeatable without storing the entire execution trace (as long as no monitor or breakpoint invoking the random number generator is added).  This allows us to run old experiments with new monitors included to gather more data.

I hope this illustrates that the new CPNaaS APIs shift the focus from the method to analyze to the actual analysis.  The focus on the state-space is gone and replaced by the properties to check.  All data is just stored and versioned correctly do it is always possible to retrieve at a later date.  All experiments are easily repeatable.  Additionally, the interface to simulation-based analysis and state-space analysis is nearly identical.

While I am happy with the progression of the API, there are a couple rough edges I’d like to work out:

  1. Versioning at different levels for verification and analysis,
  2. versioning of properties/monitors/breakpoints, test suites/experiments and models,
  3. making properties/monitors/breakpoints on nodes vs node instances,
  4. difference between properties and monitors/breakpoints,
  5. creation of monitor template functions,
  6. clarity of liveness properties,
  7. addition of more pre-defined properties, and
  8. handling of reporting.

The first issue is not visible from the examples above, but only if we zoom in a bit.  If we zoom in on the “create run” call of the verification API, we see:

Screenshot 2015-05-24 23.09.57

We note that this call takes a (n optional) snapshot as parameter, indicating that runs are tied to a particular version of the model.  Similarly, if we zoom in on the create experiment call of the analysis API, we similarly see:

Screenshot 2015-05-24 23.10.49

Thus, the versioning is taking place at different places: at the level of the run for the verification API and at the level of an experiment of the analysis API.  Multiple runs of a test suite happens on different versions of a model, while multiple runs of an experiment happens on the same version of the model.  One test suite captures a notion of change of a model, while an experiment can have multiple runs for the same model.  This is because analysis should provide the same result regardless of how many times it is invoked for the same model version (disregarding probabilistic verification methods), whereas simulation fundamentally is probabilistic.

This all makes sense technically, but I see that as a source of confusion in the future.  It therefore might be better to introduce an extra layer in the analysis API, one just grouping a number of monitors and breakpoints.  That would also make it easier to re-run a new collection of monitors on historic data.

The second issue deals with versioning of properties/breakpoints/monitors vs test suites/experiments vs models.  Currently, test suites are versioned with models and properties.  A run can be done on any version of a model with the latest test suite and properties.  But it might be useful to use older versions of properties or test suites.  Similarly, monitors/breakpoints and experiments are versioned with models, but experiments can be run with older models.  I have been thinking of replacing the snapshotId with a model URL, and do the same for propertyIds, monitorIds, and breakpointIds, allowing to intermix any version of each with any other version.  I’m hesitant to do that, however, for simplicity.  I like the structure for test suites; it is simple and flexible.  Of course, it would have to retain the version of the test suite used for performing a single run, but that’s simpler than maintaining that for each property.  Whenever a test suite or property is changed, old results are no longer comparable, though, until tests have been rerun.  That is acceptable and detectable, though.

The third point is a minor issue, which should probably just be fixed; currently, many of the constructors for properties/breakpoints/monitors refer to places or transitions, where they should really refer to place or transition instances.  The Simulation API already has a syntax or referring to instances, and this should probably be used here instead.

Screenshot 2015-05-24 23.33.58

The fourth point is a bigger one, and one I thought of when writing the piece: why are properties for state-space verification different from monitors used for simulation-based analysis?

Aren’t safety properties, evaluating a current state to a boolean property indicating a failure, really the same as a break-point evaluating a simulation state to a boolean property indicating whether to stop simulation?  They very easily could be and should have the same interface for construction (it is at the level of abstraction we’re dealing with so far, but not at the level of code that has to be run in the simulator currently).

Aren’t statistics properties and generic data collector really the same as well?  Just measuring some data values from the current state and a bit of memory?  Isn’t a state counter and a node counter the same if only used on e single trace?

Making them the same would also make it easier to perform continuous checking of state-space properties based on simulation and even dimming the difference between the two (a common probabilistic verification technique is just random simulation to see if a property is violated, which could easily be done using the analysis API, and a common verification technique, state caching, where parts of the state-space is stored to avoid revisits, might be useful to make analysis prefer more different execution traces, A* search strategies from the verification domain could be used to guide analysis in a desired direction, or a sweep-line progress measure could be used to delay analysis until a notion of progress indicating the model has left a start-up phase has been left behind).

Currently, monitors try to be clever about when they are activated, but if they stop writing in files in inefficient manners, that might not be needed.  Alternatively, a bit of static analysis could do that automatically instead of now where users have to specify which nodes a monitor pertains to.

Doing this would also make issue 5, how to elegantly generate monitoring templates.  Currently, CPN Tools generates template functions for monitors based on model structure and user selection.  CPNaaS does not have any calls specified for that.  I would very much prefer if it could be avoided altogether or at least done in a very simple manner.  And the same manner for generating template safety properties.

Issue 6 is about specifying what liveness properties have available.  I want all properties to be special cases of a generic class of properties with a simple theoretical explanation.  Safety properties are very simple: they are a predicate on states.  Liveness properties are harder, because I want a characterization that is simple enough to be efficiently checkable, yet powerful enough to determine fairness properties of transitions.

Issue 7 is just adding more standard properties.  I’ll have to go thru the CPN Tools state-space manual and make sure everything in there is present.  See if I can think of other useful properties.  It would be useful if I could do simulation, discover an error state and select 3 nodes indicating the error and have the tool mostly automatically generate a safety property identifying the error, so I can easily add it to my test harness for next iteration.  It does not have to do everything – extension APIs can easily help here, for example adding specifying properties using sequence diagrams – but it needs a good basis so extension APIs can to their magic.

The final issue has to do with reporting.  The API should not to reporting itself.  Nor should it force tools to reinvent the wheel for reporting.  There are fantastic reporting tools out there, and it would be dumb to not use them.  I’ve already indicated I would like to store analysis data in a time-series database like InfluxDB and then “somehow” use a generic tool like Grafana to search thru the data.  It needs to be possible to do this in a secure way, fast, and also to provide simple statistics in an appealing manner without much user interaction.  The standard reports and simulation summary report of CPN Tools are still the most used analysis tools for most users after all.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.