Math and Tools for Running Experiments

I’m running experiments for a paper, I’m currently working on.  We’re testing a new tool, we’re working on.  Of course, we compare it to our old tools, so I’m currently generating state spaces for 6 different CPN models.  Each model is parametrized, so in total I have 38 instances (at the moment).  In order to get sort of accurate results, I run each test three times, and each test has to be run on two or three tools (only 12 of the instances can also run on an old version of our tool).

When you just need to run tests, you can of course just set up a script (supposing your tool runs on the command line or is scriptable), but some times you also need to get an idea of which tests to run – it is not obvious which instances to run.  We want to go to the limit to show that the new tool pushes the limit, but we don’t want to overshoot it.

Luckily, I know most of the models very well, so I have a good idea about where the limits are, but some of them are new, so I need to investigate.  This investigation can be done either manually or automatically.

Manual Investigation

For manual investigation, I use our tool ASAP, which allows me to graphically build my experiment as this:

To someone not familiar with the tool, this may look overwhelming, but it is a fairly simple graphical way to compose experiments and describe what to do with the results.  I set this up, execute and just let the computer handle the rest.  Results are automatically collected, grouped and archived for future reference.  Currently, I have quite a few results in my workspace:

All of the “Execution” directories” contain the results of a single execution.  Below the Project Explorer, I can follow the state of the currently running jobs.  ASAP fortunately supports running experiments of computation servers, so I can just dispatch jobs to different computers and wait for the results to trickle in to my overview.  Alas, I have to do the distribution manually at the moment, but I’m thinking of adding grid functionality into the tool, so experiments are automatically dispatched to an idle computer if any exists and otherwise just kept in a queue, but for now, this suffices.  In the window to the right, I can inspect details of each execution.

Automatic Investigation

Some times I also just want the computer to do the inspection.  This may require more computer power than manual investigation, but it requires less of my time, so all in all it’s a win.

The investigation is often of the form: for how large parameter can I run this model before I run out of time or memory.  Unless I wish to help the computer, it has no idea of what the parameter is, so the simplest way is to just start from 0 and increment until it runs out of time/memory (we assume that the time/memory grows monotonously as a function of the parameter).  This works, but is very time consuming, especially if the parameter needs to get large before I reach the limit.  Consider the below example:

If we assume we see how large we can get the parameter (L2) before we run out of time (L1), i.e., the largest L2 such that f(L2) ≤ L1 but f(L2 + 1) > L1, the total amount of time we would need corresponds to the area below the function f, which is bounded by L1 * L2 – so if L2 is large, we are going to get a large number.

What we want is to take short-cuts.  One way of doing that is to take larger steps than incrementing by 1.  If we increment by more, we risk over-shooting the target, however, and we need to search for the bound.  If we have two values, n1 and n2, and we know that when using parameter n1, we undershoot the limit and when using n2, we overshoot the limit, we can use binary search to search for the limit.  Simply evaluate the function in the mid-point of the interval, m, and if we overshoot again, search in the new interval [n1, m-1] otherwise search in [m, n2-1].  At worst we overshoot each time, so the time used for searching is bounded by L1 * log(size), where size is the length of the interval (in the example n2 – n1).  While much better than before, we still wish to keep the interval small, as we are searching near the bound, so there’s a good change the time estimate is quite good.

We cannot find a good constant value to increment our steps, as a large value will work well when L2 is large, whereas it will force us to search a large area above the bound if L2 is small compared to the increment.  We therefore want an automatic way to search for the bound.  One way to do that is by double the value for each step, i.e., first testing 0, 1, 2, 4, 8, 16, etc.  If the function is approximately linear, this actually works quite well, as we’ll at most get 2 * L2 away from origo, so the total time spent is bounded by log(2 * L2) * L1, which is much better than L1 * L2 from before.

Unfortunately, the function is rarely linear.  It is often quadratic or even exponential.  The total time is still bounded by 2 * log(L2) * L1, but if even f(x) = 2x, i.e., if L1 = 2L2, the calculation changes. We then get log(2 * L2) * L1 ≈ log(sqrt(L1)) * L1 = 1/2 * log(L1) * L1 when we use doubling, but only 1 + 2 + 4 + … + 2L2 = 2 * 2L2 -1 ≈ 2 * L1. So unless L1 is very small, it is actually faster to use simple incrementation because the cost of overshooting is so much worse than undershooting several times.

This leads us to my current solution for automatic investigation.  The idea is to estimate how the function grows.  It is in fact enough to find out whether it is exponential, and if so just use stepping and otherwise use doubling, but with a little cleverness, we can do better.  The idea is to use linear regression, a technique for estimating a function from a set of points.  We basically want to guess f and solve the equation f(x) = L1 ⇔ x = f-1, and try that value.  So now the idea is to try a couple low values, say 1, 2, 3, 4, 5, and use the pairs (1, f(1)), (2, f(2)), …, (5, f(5)) for linear regression.  We can make regression test if the function is linear, polynomial or exponential (if it’s worse than exponential we most likely have found the bound while testing the first few values) and use that to solve our equation.  We then try with the value f-1(L1) rounded down, because we need a pair, (x, x+1) to prove we have found the bound, and we’d much rather calculate an extra value below the bound than an extra value above the bound – first, it is faster and second, we get and extra data point.  If the estimated value turns out to be below the bound, we add it to the set of values we use for regression and try again.  If the value is above the bound, we simply try a value one above the highest value we have tested until now.  If, when we solve our equation f-1(L1), we get a value lower than the highest value we have found below the bound, we also just try a value one higher than the current highest value.  Of course, we also keep track of the lowest value we have found above the bound, and make sure never to test higher values than that.

All in all, this algorithm tries to guess the function, and always decreases the span between the lowest value above the bound and the highest value below the bound.  It tries taking larger steps when possible, and if this does not lead to new information, it conservatively attempts a value that is sure to either provide more information or to solve the problem entirely.

So, that’s how I run experiments.  Right now, I’m running some experiments (the ones from the screen shot above), but they don’t seem to want to terminate today, so I guess I’ll just go to bed and let it all sort itself out.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.