Quick and Dirty Declare Mining

We needed a bit more information from the Declare miner, so I proposed making a quick and dirty one based on my improved automaton library.  This is the result of one afternoon, so it’s not really user-friendly ((It’s just very particular about which users it is friendly to…)) or smart.  The full miner is around 330 lines of code and relies on 550 lines for the GUI.

It simply instantiates builds an automaton for each constraint, traverses that with all possible mappings from events to parameters.  This naïve approach makes it easy to extend the miner with new constraints (simply input the LTL formula and the miner automatically knows all about it).

The miner can compute support, but does not use that for mining; it just does the dumbest thing possible and instates all constraints without exploiting information like implication of constraints, symmetry of parameters, or reduction using the Aprori algorithm.  Support is derived automatically from the initial formula, so no manual intervention is needed.

This is the main screen of the miner:

Screen Shot 2013-05-21 at 20.55.09

I can pick and choose among all constraints and select whether I want to compute support.  The tool can filter constraints according to how many traces match and according to support.  If support is not computed, this is ignored.  The tool remembers all preferences for convenience.

Mining looks as such:

Screen Shot 2013-05-21 at 20.55.28

The main screen gives “useful” ((Clever-looking.)) logging information. The top progress bar counts how many constraint templates have been investigated and the bottom the progress of the current template.

When mining is done, we get this:

Screen Shot 2013-05-21 at 20.55.34

The progress bars have gone and we see the end result.  The miner mines 138532 constraints using this configuration.  It checked 32 constraint templates in 13.5 seconds.  This is done for a log with 59 traces, an average of 23 events a trace, and a total of 35 event classes.  This is thus a small but relatively complex log.  Adding more traces extends time linearly (so we get around 264 traces/minute for a log of this complexity.  Here, we checked 4 templates with 3 parameters, 17 templates with 2 parameters, and 11 templates with one parameter, for a total of 177695 template instantiations.  Increasing the complexity to 50 event classes approximately quadruples this (and the time necessary), and going to 100 event classes adds another factor 8 for approximately 4 million template instances.  Going to 1000 event classes adds a factor 1000, for a total 400 million template instances; still, with a log of this size but the immense increase in complexity, we can do mining in reasonable time (approximately 3.5 days with the current implementation).

Using basic profiling I’m sure, I can improve this by a factor 2-3 or more.  Using that some constraints imply others may be useful in conjunction with colored automata.  This algorithm is also so dumb, it trivially scales linearly with the number of CPUs.  Using symmetry will have the biggest impact on the complex constraints (taking the most time), yielding another factor 3-5.  I also have a simple notion of support which further reduces processing time by 25%-50%.  All in all, I believe I can end up mining approximately 150.000 traces/minute on each of our grid nodes (16 CPUs) or around a quarter of a billion traces/hour on our entire grid (32 machines or 512 CPUs in total).  The algorithm uses very little memory (in fact the most memory is spent keeping track of the results), but using external storage that can be alleviated.  And then I can start being smart.  Mining constraints with 4 parameters takes approximately 65 seconds for this log, but the only constraints with many parameters have a high degree of symmetry, and this could easily be reduced with a factor 4! = 24.

Of course, we do not intend to present the user with hundreds of thousands (or even hundreds) of constraints, but this proof-of-concept proves it is possible to mine all constraints with no prejudges, making it feasible to do so and subsequently filter them instead of making assumptions beforehand.

So, can you get your hand on this?  Of course: Just download MegaMiner.jar, double click (assuming you have Java installed), and mine away.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.