Blog Post 3 – Moving on to C++

In this blog post, we will shortly discuss the advancement of the project as well as the work we would like to accomplish before the deadline.

The first main point we would like to discuss would be the conversion of the MATLAB scripts towards C++ application with a user-friendly interface to regulate the different parameters of the simulation, shown below.

User Interface

The choice of a more object oriented language is justified by the increase in speed of computation, the better definition of structures and classes and a more flexible programing environment. Moreover, this would authorize easier post-final report changes if our research is used in the future. More importantly, we were still struggling to understand if we would be implementing a class “Individual” where specific mobility and pandemics aspects could be affected individually as well as a notion of memory (path visited, time of infection etc.). Since we estimated the time remaining insufficient for implementing such an intricate notion in our model, we decided to continue with our initial thought of developing a SIR model at the node level.

So far, our system contains nodes as main units, but no edges. Those nodes can be visualized during the simulation time as circles on a 2D map, with the following colour code: green corresponds to a node with a majority of healthy individuals, red stands for a node where most people are infected, and black is related to nodes containing mostly dead people. An example of simulation on a simplified Venice network is shown below.

cpp simulation

 

Another modification we made in our code was thought to be necessary after a few simulation attempts using a “basic” random assignment method, which yielded a non-negligible computation time. Indeed, a random destination needs to be computed for each travelling individual of each of the 6000 nodes at each time step, depending on the inter-nodal distances. These results showed the key factor played by the random destination assignment, whose algorithm needs to be optimal in terms of computation time and resources. Therefore, we implemented of the alias method which permits a much quicker computation of the probabilities of movement from a node to another. This method – summarized below – requires a relatively costly initialization, but yields instantaneous-like computations at runtime. Suppose we are given a distribution of n weighted probabilities going from 0 to 1. For the initialization part, we consider a 2D orthogonal space in which the first dimension – namely x – on which each probability (from 1 to n) forms an interval of size 1 ([0,1[, [1,2[, …, [n-1,n[), and the second dimension – namely y – is formed by the actual probability values at each index (from 0 to 1). This 2D space is not continuous in the y dimension as the probability values vary across the distribution. An example of it is given below with n=6.

aliases1

Hence, we aim to transform it to a continuous 2D rectangle of width n (in the x dimension) and height 1/n (the average probability value, in the y dimension), for a total area of 1. Since the sum of all probabilities equals 1, all we have to do is to fill parts of the 2D space that are below the 1/n line with those that are above it, in order to get the 2D rectangle of interest. An example of such transformation is given below.

aliases2

During the process, a so-called “alias” is saved for each completed column (the ones that were below the threshold). This alias is nothing but the index of the column that was used to complete the current column (it is thus an integer). Thus, each column will be composed of 1 to 2 sub-rectangles, the lower one corresponding to the probability at the current index and the eventual upper one corresponding to the “alias” probability of another index that was used to complete the column. Note that if some columns were partially moved to others, the total area occupied in the rectangle by each “probability” stays constant. This concludes the initialization part. Moving on to the execution time, a given random assignment works as following:

  • We generate a random integer between 1 and n. This number tells us which column we will look at.
  • For the column of interest, we generate a random real number between 0 and 1/n. If this number is smaller than the probability at this index, the current index is taken as random assignment; otherwise the “alias” index is selected instead.

As one can see, this method only requires the generation of 2 random integers and one comparison at runtime, corresponding to a very low computational cost. Furthermore, the initialization algorithm ensures that even if index probabilities are modified, the weight of each sample of the distribution is kept constant. Hence, the choice of a random sample is properly weighted by the original probabilities.

In the context of our project, samples are represented by nodes and travelling probabilities are computed as a function of the contact kernel and inter-nodal distances. Hence the distributions are different from node to node and we need one n-elements table per node. Consequently we are using 2 n x n storing matrix, in order to store the probabilities (“double” numbers) and the aliases (integer numbers). Although the initialization of those matrices might take up to 1 minute for a population of 600 nodes, the runtime performances observed using this method are very satisfying in terms of computation time. Considering the possibility of a simulation of several thousands of steps, runtime performances prevail on initialization cost, thus this method is a good choice.

Regardless of the random assignment method, we are still struggling to stimulate correctly different epidemics behaviours. In particular, we have not yet achieved to simulate a wave-front disease propagating out from an initial center of infection (due to a highly restrictive contact kernel, e.g. an exponential function) as opposed to a nonlocal spread of disease, with many satellite outbreaks and no clear wave-front (due to a less restrictive contact kernel such as a power function). Thus, our next priority would be to fine-tune our model to simulate different behaviours of the pandemics. For this, we still have to take into account previous similar researches and perform identical simulations to fine-tune the parameters of the SIR differential equations as well as the kernels. We would like as well to implement graphical tools to help understand what is happening during the simulation: infected densities graphs, number of deaths, heat maps of Venice.

Another remaining issue is the way inter-nodal distances have to be computed. Indeed, there is still a conflict between the edge’s GIS coordinates and the node’s GIS coordinates, therefore edges have not yet been inserted into our application, and we are using radial distances so far. This choice might be realistic if we consider the plague as an air transmitted disease. However, discarding edges from the model means that the project becomes less specific to the Venice context. Hence, the choice off adding edges to our model has still to be assessed. Alternatively, we could implement a clustering algorithm which will consider for each node the k closest neighbours as connected nodes and the other ones as disconnected. The number of nodes to connect each time could be varied by the user, and might give us different patterns of infection spread. Furthermore, this would enable us to implement a quarantine coefficient since fewer inter-cluster connections would stop the spread of the epidemics.

Eventually, and only if there is sufficient time remaining, we would like to search for data present in the literature to increase the likeliness of our model. Data such as population densities per parishes, attractor nodes or even number of deaths during a plague outbreak could be helpful to run a simulation that is probably a reproduction of what happened during plague outbreaks in Venice.