paint-brush
How We Evaluated CGAAL: The Experiments That We Ranby@heuristicsearch
129 reads

How We Evaluated CGAAL: The Experiments That We Ran

tldt arrow

Too Long; Didn't Read

To evaluate our tool, we run several experiments. In our experiments we compare our global algorithm, our local algorithm using our various search strategies, and the established tool PRISM-games. We use several different concurrent game case studies. Some of these are PRISM-games case studies adapted to LCGS for CGAAL and determinised if needed. Others are well-known algorithms and games constructed during the development of CGAAL.
featured image - How We Evaluated CGAAL: The Experiments That We Ran
Aiding in the focused exploration of potential solutions. HackerNoon profile picture

This paper is available on Arxiv under CC 4.0 license.

Authors:

(1) Falke B. Ø. Carlsen, Department of Computer Science, Aalborg University, Denmark & [email protected];

(2) Lars Bo P. Frydenskov, Department of Computer Science, Aalborg University, Denmark & [email protected];

(3) Nicolaj Ø. Jensen, Department of Computer Science, Aalborg University, Denmark & [email protected];

(4) Jener Rasmussen, [email protected];

(5) Mathias M. Sørensen, [email protected];

(6) Asger G. Weirsøe, [email protected];

(7) Mathias C. Jensen, Department of Computer Science, Aalborg University, Denmark & [email protected];

(8) Kim G. Larsen, Department of Computer Science, Aalborg University, Denmark & [email protected].

Introduction

Definitions

Model Checking

Tool Overview

Evaluation

Conclusion and References

5 Evaluation

To evaluate our tool, we run several experiments. In our experiments we compare our global algorithm, our local algorithm using our various search strategies, and the established tool PRISM-games. We use several different concurrent game case studies. Some of these are PRISM-games case studies adapted to LCGS for CGAAL and determinised if needed. Others are well-known algorithms and games constructed during the development of CGAAL.


We make the PRISM-lang and LCGS implementations of models as identical as possible, such that the state spaces are comparable. Some of the concurrent games and related ATL formulae used in the experiments are presented below. Queries marked with ⊺ (resp. ) are satisfied (resp. not satisfied), and queries marked with † may terminate early as we are not required to compute the entire fixed-point:


• Mexican-Standoff: In this model N cowboys stand in a circle, each with a revolver. At each moment they can choose to shoot another cowboy or do nothing. If a cowboy is hit by B bullets, they are incapacitated. We run the following queries:



• Gossiping Girls: In this model, N nodes each know a piece of information. The nodes can exchange information by calling each other. Our experiments include both a fully connected network of nodes and a circular network of nodes. We run the following queries:



• Robot Coordination: In this model, four robots move orthogonally on a N by N grid. Each robot needs to reach the opposite corner of where they start, but crashing into each other is fatal. Werun the following queries:



Each experiment is run with a time limit of two hours, allocated 128 GB of memory, and has 32 cores available regardless of how many worker threads CGAAL may spawn. All experiments are run on several identical AMD EPYC 7642 based servers, allowing only one experiment per node at a time to reduce noise in the results.


Results Here we provide a select set of experimental results that roughly exemplify our findings in general. For a more thorough set of results along with all of our experimental data and the Python script processing it, see the git repository https://github.com/d702e20/evaluation-results/.


We find that the local on-the-fly algorithm is often one order of magnitude faster than the global algorithm regardless of the search strategy. However, the global algorithm can compete with and sometimes outperforms the local algorithm in cases where the local algorithm cannot terminate early. These matches observations made by A. Dalsgaard et al. [4] and M. C. Jensen et al. [10].



A general trend in our experiments is that we see an increase in the execution speed of our distributed implementation as we increase the number of compute threads available. The only times we do not see a speed-up with an increased number of compute threads are when the models are small enough such that the overhead of managing the additional threads is significant.


Similarly, a single thread performs better than a few threads in many cases, since there is no communication overhead with only one compute thread. In general, depending on the search strategy employed, we see an improvement on a scale of one to two orders of magnitude when increasing the number of compute threads as exemplified in Figure 9d.


(a) Mexican-standoff: horizontal axis is (N,B) tuples with N being the number of cowboys in the model and B the number of bullets the cowboys can be hit by.


(b) Robot Coordination: horizontal axis is the size of the grid the robots are to manoeuvre on.


(c) Gossiping girls in a circular topology: horizontal axis being the number of girls in the circle.


(d) Gossiping girls with six girls in a circular topology: horizontal axis being the number of compute threads used in our distributed algorithm. Note that PRISM was always run with just one thread.


Figure 9: Select experimental results. CGAAL-results in Figures 9a to 9c show the best result for each search strategy when varying between 1 and 32 threads.


This paper is available on Arxiv under CC 4.0 license.