Differential Evolution Dynamics Analysis Using Aggregated Networks

In this paper, the dynamics of the selected variants of the di erential evolution is modelled by aggregated network capturing the relationships between individuals established during the population evolution. The motivation of this research is to better understand the relationships between individuals of the selected variants of the di erential evolution. Thanks to the analysis of the aggregated networks, the advantages as well as bottlenecks of the selected algorithms can be speci ed more precisely and the results of the analysis can be used to develop novel algo-


Introduction
Complex networks have become an eective tool to model real-world systems such as computer, social, epidemiology, telecommunication, biology, genetic regulatory, and metabolic networks, Internet, World Wide Web, Airline Routes etc. All of these systems consist of the huge number of highly interconnected dynamic units, where the connection between two units can be consid-ered to be the certain form of communication. For example, in social networks, a connection between two users might express that these users are in relationship. In the food web, the connection between two organisms can be interpreted such that one organism is food of another one etc.
The evolutionary algorithms are the population-based meta-heuristics which belong to a great family of the articial intelligence. They are inspired by the biological evolution and they are based on principles as reproduction, crossover, mutation and natural selection. The population consists of individuals. They are crossed, worse individuals die, better ospring takes place of their parents etc. We can say that there are tights between individuals each ospring has its parents; to create a new ospring, several parents are selected (they are in relationship). Some individuals are used to improve the population, the others are useless from the perspective of a new ospring creation etc. The parameters of the individuals are changing, their tness is improved etc. When we consider the dicult systems mentioned above, we can see some analogy. The population of individuals in the evolutionary algorithm can be considered to be a dynamical system of the interconnected units. Storn and Price [1] in 1995 and it has been chosen for this research for several reasons: One of the most important reason is its good performance in comparison with other evolutionary and swarm algorithms [2], [3] and the fact that it has been successfully used in dierent areas of research, for example, system design [4], cybernetics [5], electromagnetic [6], clustering of large unlabeled data sets [7], neural network training [8] etc.
Various novel schemes of the dierential evolution (DE) algorithm have been described in the last years. Among others, each scheme diers from each other in manipulation with individuals or in the control parameters settings, which oers some alternatives for our research. In this paper, the dierential evolution is considered to be a system consisting of dynamical units which are highly interconnected. Dierent variants of the DE algorithm might mean a dierent form of interconnection between individuals in the population. The studies dealing with comparison of these variants from the perspective of the quality of provided solution, for example [9], indicate connection between the principle of individual selection to create a new ospring and the algorithm performance.
The rest of paper is organized as follows: In Sec. 2, the principle of the DE as well as the selected variants used for out experiments are described briey. In Sec. 3, the idea to model the evolutionary algorithms dynamics using complex networks as well as related publications are mentioned. In Sec. 4, the idea to consider the network generated on the basis of the DE algorithm dynamics to be articial social network is claried. Sec. 5 deals with the aggregated networks and its analysis. In Sec. 6, the experiment is presented and in Sec. 7, the results are discussed.

Dierential Evolution
The DE was described in 1995 [1] [10] and in 1999, the DE was summarized in the compendium New Ideas in Optimization [11], [12]. Price and Storn described various applications of the DE, for example, [13], [14] and [15] and their research has been early followed by many authors from all over the world [12].
The dierential evolution belongs to a great family of evolutionary algorithms (EAs) whose principles are inspired by biological evolution such as mutation, crossover and natural selection. As well as the other EAs, the DE works with population of N P individuals which are represented by D-dimensional vectors x G i of realvaluated parameters. Each parameter is constrained by its search range, hence for each parameter j of vector x G i upper and lower bounds must be specied. These values can be collected into two D-dimensional vectors denoted as b U and b L , where U and L indicates the upper and lower bound, respectively. The initial population P 0 x is then composed by vectors generated randomly in the prescribed range such that where rand j (0, 1) is the uniformly distributed random number within range [0,1] [16].
When a population is initialized, mutation and crossover are used to generate new vectors (trial vectors) and the selection determines which vector will be accepted to the next generation. In this part, the mutation step will be discussed. Unlike the genetic algorithm, DE performs mutation step before crossover operation such that for each target vector x G i a mutation vector (will be also denoted as a donor vector) is generated according to the following equation where x G r1 , x G r2 , and x G r3 (r 1 = r 2 = r 3 = i) are solution vectors randomly selected from the current population and F denotes the scale factor. Randomly selected solution vectors must be different from each other and from a target vector i, so in this case, a population must consist at least four individuals. If a donor vector v G i contains parameters out of the predened range, these parameters will be regenerated randomly in the space of possible solutions.
The main role of the crossover operation is to construct a trial vector u G i using a combination of parameters of donor vector v G i and target vector x G i . There are two kinds of crossover methods binomial and exponential. In this paper, the binomial crossover will be used.
At the beginning of the binomial crossover, a random integer j rn ensuring that at least one parameter will be taken from a donor vector is generated from the interval[0, D−1]. A trial vector is then constructed such that for each from D parameters a random real number rn(j) from the unit interval with the uniform distribution is generated and if the value of rn(j) is not greater than value of the crossover rate CR or the index of the parameter j is equal toj rn , the element from a donor vector is taken as a parameter of a trial vector. Otherwise, a parameter of a target vector is accepted [17]. The binomial crossover can be outlined as follows: When a trial vector is constructed, it is evaluated. Then the selection operation is performed such that the objective function value of the trial vector f ( u G i ) is compared with the objective function value of the target vectorf ( x G i ). If the objective function value of the trial vector is not worse than the objective function value of the target vector, the trial vector will survive to the next generation. In opposite case, the target vector will be accepted to the next generation. The selection operation is mathematically dened as follows: where f () is the function to be minimized. In other words, if the objective function value of the trial vector is not greater than objective function value of the target vector, the trial vector will replace corresponding target vector at the next generation; otherwise, the target vector will be preserved. The trial vector is accepted to the next generation even in the case that the value of its objective function is equal to the objective function value of the target vector, which enables vectors to overcome at objective function landscapes [18].

Selected Variants of Dierential Evolution
Dierential evolution is powerful and ecient evolutionary algorithm, however, there exist some bottlenecks whose solution is the subject of many researchers from all of the world. In the following text, the prominent DE variants, which have been used for the purpose to investigate the dierences between the relationships established during the evolution process will be described briey.
We have chosen four dierent variants of the dierential evolution. Our choice has been dependent on the principles of the control parameters adaptation and on the mutation strategies used in these algorithms. Therefore, as the rst algorithm, the original version of the DE (DE/rand/1/bin) has been selected. There is no adaptation of control parameters. The original DE algorithm uses only one mutation strategy rand/1, which has been described in the previous section, and the binomial crossover.
The second DE algorithm investigated in this work is the Self-adaptive DE denoted as jDE introduced by Brest et al. [19]. In the jDE algorithm, each individual in population has its own crossover rate CR G i and scale factorF G i . At the beginning, for each individual, the scale factor F G i is generated with the uniform distribution from the interval [0.1, 1.0] and CR G i takes a value from [0, 1]. New control parameters F G+1 i and CR G+1 i for the individual x G i are then recomputed as follows: where F l and F u are the lower and upper bounds of F i , τ 1 and τ 2 denote probabilities to adjust i are obtained before the mutation operation is performed and thus they affect the mutation and crossover operations of the actual solution vector x G i . We have chosen this algorithm because of its control parameters adaptation. The jDE algorithm uses the same mutation and crossover strategy as the original DE variant. Therefore we can investigate how the adaptation of the control parameters inuences the relationships between individuals.
The third selected algorithm combines two mutation strategies on the basis of the exploration or exploitation requirement. Moreover, it uses the control parameters adaptation, which diers from the principle used in the jDE algorithm. This algorithm has been proposed by Yi et al. [20] and it is denoted as HSDE. At the beginning of the algorithm, the population is generated randomly in the space of possible solutions and for each individual the control parameters F i and CR i are generated randomly from within the feasible range. Then for each target vector a donor vector is generated such that ve mutually dierent individuals are chosen randomly and if an objective function value of a target vector is worse than objective function values of the rst two randomly selected individuals, the mutation strategy rand/1 will be used to generate a donor vector. Otherwise, mutation strategy current-to-best/1 will be used The last selected algorithm is the JADE described by Zhang et al. [21] using the dierent principle of the control parameters adaptation and novel mutation strategy denoted as currentto-pbest/1. In this work, we will work with the JADE version with no external archive. In the strategy "current-to-pbest/1", where the external archive is not used, the donor vector v G i is generated as follows: best is randomly chosen as one of the top 100p% individuals in the current population with p ∈ (0, 1] and F i is the scale factor associated with x G i . The scale factor is dynamically updated at each generation.
The adaptation of control parameters in the JADE algorithm can be described as follows: At each generation, for each solution vector, the scale factor is recomputed as follows: meaning that the scale factor F i of each solution vector is generated according to Cauchy distribution with location parameter µ F and scale parameter 0.1. IfF i ≥ 1, it is truncated to be 1.
At the beginning of the algorithm, the location parameter µ F is set to be 0.5 and then updated at the end of each generation according to the following equation: where mean L (·)is the Lehmer mean dened as where S F is the set of all successful scale factors F i 's at the actual generation.
The crossover probability CR i of each solution vector is generated according to normal distribution of mean µ CR and standard deviation 0.1, which can be outlined as Then it is truncated to [0, 1]. The mean µ CR is recomputed at each generation as follows: (12) where c is a positive constant between 0 and 1 and mean A (·) denotes the arithmetic mean. S CR is an archive of the successful crossover probabilities at the actual generation [21].

Evolutionary Algorithms Dynamics Represented by Complex Networks
Connection of the evolutionary algorithms and graph theory is not new. In 1999, Ashlock et al. [22] described a graph based genetic algorithm, where the combinatorial graphs are used to limit the choice of the partner for the crossover operation with the goal to preserve the diversity of population. In 2007, Mabu et al. [23] proposed a graph-based evolutionary algorithm denoted as the Genetic Network Programming (GNP), where the programs of the GNP consists of nodes, which are capable to execute simple judgment or processing.
On the other hand, in the history, the evolutionary algorithms have been successfully used to solve some problems relating with graphs and networks as the graph coloring [24], [25], community detection [26], [27], maximum clique problem [28], [29] etc.
In 2010, Zelinka et al. [30] introduced a preliminary study investigating relations between complex networks (CNs) and evolutionary algorithms dynamics. In this study, two versions of the DE algorithm and four versions of the Selforganizing migrating algorithm (SOMA) [31] have been selected and on the basis of their dynamics the complex networks have been generated such that the individuals have been modelled by nodes and the relationships between individuals by oriented lines (arcs) of the CN.
Main motivation of this research was to investigate whether it is possible to visualize the dynamics of the selected evolutionary algorithms by complex networks and how the analysis of such complex networks can be used to improve the selected EAs performance. On the basis of the experimental results authors concluded that the occurrence of the CN structure is depended on the number of individuals as well as on the number of generations (migrations). Authors also emphasized that the CN structure forming has been observed in algorithms "based or partly associated with swarm philosophy rather than randomly remote algorithms". This study has been extended in 2011, where Zelinka [32] described, how the CN dynamics are visualized by means of chaos visualization techniques and then controlled by EAs or classical control techniques.
In 2014, Davendra et al. [33] presented results of the analysis of the complex networks generated on the basis of the dynamics of the Discrete Self-organizing Migrating Algorithm (DSOMA), where the CN attributes as minimal cut, degree centrality, closeness centrality, betweenness centrality, Katz centrality, mean neighbor degree, k-clique, k-plan, k-club, and graph of communities are investigated. These results conrmed that the analysis of the CNs generated on the basis of the DSOMA algorithm dynamics can be useful to obtain valuable information about population development. In the same year, Davendra et al. [34] used the complex networks to analyze the attributes of population dynamics of the Enhanced DE algorithm applied to the ow-shop scheduling with no-wait problem.
Recently, Janostik et al. [35] introduced a graph representation of the swarm evolution in the Particle Swarm Optimization (PSO) and the diversity measure based on this graph representation. In [36], the networks are used to capture inner dynamics of Firey Algorithm and in [37], the networks are used to capture the inner dynamics of the PSO. As mentioned before, the goal of this work is to analyze the dynamics of the dierential evolution algorithms using networks. The comparison of the selected DE algorithms from the perspective of the network structure might bring new information about the relationships between individuals, which can be later used to improve the properties of the selected DE variants. In this work, the DE algorithms have been selected on the basis of their mutation strategies and control parameters adaptation principles. Thanks to the various number of the DE algorithms, we can investigate how only the simple mechanism of the control parameters adaptation or the dierent mutation strategy aects the relationships between individuals in the population.

Dierential Evolution Dynamics and Social Networks
Modeling of the evolutionary algorithms dynamics using complex networks has been described by Zelinka et al. [38]. As mentioned, main motivation was to better understand the relationships between individuals and apply this knowledge to improve the performance of the selected evolutionary algorithms. Dierential evolution and Self-organizing migrating algorithm have become the rst investigated algorithms.

In [38] and [39], Zelinka et al. proposed a new perspective on the new ospring creation in the DE algorithm, where a new solution vector
generation is considered to be just an activation of a target vector to move to a better position. More precisely, if a trial vector u G i replaces a target vector x G i at the next generation such that x G+1 i = u G i , it will be considered to be an activation of a target vector x G i by three solution vectors x G r1 , x G r2 , and x G r3 (selected randomly in the mutation operation to create a donor vector v G i ) to move to a better position.
The activation of target vectors is modelled by a directed graph such that a target vector x G i and all three solution vectors x G r1 , x G r2 , and x G r3 are represented by nodes and there is a directed arc leading from each node representing a solution vector x G rj , where j = 1, 2, 3 to a node representing a target vector x G i , see Fig. 1. For each generation, one directed graph is created. When we consider the process of new ospring creation, we can say that there is a relationship between target and donor vector, or between a target vector and trial vector. However, we can also take into account the relationship between target vector and solution vectors, which have been chosen randomly to create donor vector etc. Moreover, there can be more than one type of the relationship between two individuals. When we look at the individuals in the population as at the articial social units, we can try to use the social networks to model the dynamics of the DE algorithms and the social network analysis methods to analyze such networks.
The main idea of this section is that the individual in a population can be considered to be a social unit. Social unit can be anything pupil in a class, a class at a school, or a school in a city. As well as other social units, the individuals in a population have their own properties parameters and tness. These properties are in the context of the social network analysis very important. For example we can investigate if there is some connection between the individual with the best tness value and the node with the highest out-degree.
Formally, a social network can be represented by a directed or undirected graph G = {N, L}, where N denotes the set of nodes and L the set of lines between pairs of nodes. The set of nodes represents the set of actors N = (n 1 , n 2 , . . . n g ). Relation between two actors in a social network is usually drawn by a line connecting two nodes in a graph. This line can be directed, in such case we talk about arc, or undirected, then we talk about edge [40]. In the case of the directed graph, the maximum number of arcs is g · (g − 1) , where g denotes the number of nodes. In the undirected graph, the set L contains at most g · (g − 1)/2 lines.
In this work, two possibilities how to represent the evolution process by a social network have been taken into account. Considering the fact that the DE algorithms work in generations, it is possible to create one directed network for each generation. These networks will be in the following text denoted as the short-interval networks.
Thanks to this mechanism, the inuence of the solution vectors (selected to create a donor vectors) on the evolution of target vectors can be investigated. However, this principle does not enable to analyze the inuence of the individual on the evolution of the whole population during the algorithm. This problem can be solved by using so called aggregated network, which is the accumulation of the short-interval networks, see Fig. 2.
Beside the directed graphs, the undirected graphs, which would represent the short-interval networks, have been under consideration such that in the case of the successful trial vector creation, the undirected edge would be created between each pair of nodes representing the participating individuals. We can say that the individuals participating in the successful trial vector generation would be considered to be the "authors" of the ospring. The networks generated on the basis of this principle could be analyzed as the co-authorship network. However, in the case of the network generated in this way, we could easily loose the information about the "role" of the individual represented by a node. In other words, we would not be capable to visually distinguish between nodes representing solution vectors selected in the mutation operation to create a donor vector (activators) and nodes representing target vectors, which have been replaced by better ospring (activated individuals). For this reason, we have decided to model the dynamics of the DE algorithms as suggested in [30].

Aggregated Networks
In this work, the relationships between individuals, which have been established during the population evolution from the beginning till the end of the DE algorithm will be analyzed using aggregated networks. As mentioned before, the aggregated network is the accumulation of the SINs, which means that the analysis of the sequence of the SINs is replaced by the analysis of the one network.
Consider the principle of the DE algorithm and the settings of the parameters, where the number of generations is set to G = 3000 and the number of individuals to N P = 100. Because there is a relatively small number of individuals and a large number of generations, there is a high probability that during the population evolution, two individuals i and j will be in relation more than once. Therefore the intensity of the relation between two individuals will be expressed by the weight of the arc. The networks will be created by the individuals participating in the evolution of the population (activators and activated individuals) as suggested in [38], [39], for better illustration see Fig. 1 and Fig. 3. This means that individuals, which have not been replaced by better ospring or they have never been used to create better ospring will not be taken into consideration.

Experiment
The motivation of this experiment is to investigate the relationships between individuals established from the beginning till the end of the DE algorithm. We are especially interested in the dierences between the selected characteristics of the networks generated on the basis of the dierent DE algorithms dynamics.
The aggregated networks are generated on the basis of the four state-of-the-art DE algorithms, which have been used to optimize the dicult test functions from the benchmark set CEC'2013 Special session on real-parameter optimization [42]. For this experiment, two unimodal (f 1 , f 5 ) one multimodal (f 6 ), and one composition (f 21 ) function has been chosen.
The detailed parameters settings for the DE variants is given as follows: The population size has been set to N P = 100 for all algorithms and benchmark function, the dimension has been set to D = 30 and the number of generation to G = 3000 which corresponds with the 300,000 objective function evaluations. This is in compliance with the suggested values of the number of the objective function evaluations mentioned in [42].
In the DE/rand/1/bin, the scale factor has been set to F = 0.5 and crossover rate toCR = 0.9. In the case of the jDE algorithm,τ 1 = τ 2 = 0.1 andF ∈ [0.1, 1.0]. The same settings of the lower and upper bound of the scale parameter has been chosen for the HSDE algorithm. For the JADE algorithm, the number of the best individuals, which are selected in the mutation operation has been set top = 5. We do not use any archive, so the number of individuals in the archiveA = 0. The second parent is selected only from the population, not from the union of the population and archive. At the beginning of the algorithmµ F = µ CR = 0.5. The parameter c has been set toc = 0.1.

Investigated Characteristics
Due to the high number of generations (G = 3000), the aggregated networks generated on the basis of the dynamics of the DE algorithms are represented by complete graphs (where each node is connected with each other). Therefore, in this experiment, we are specialized in two characteristics of the aggregated networks providing valuable information about the relationships between individuals, which have been established during the population evolution, namely the weighted clustering coecient and node out-strength distribution.
The weighted clustering coecient reects the strength of the relationships between individuals as well as the tendency of the DE algorithm to create tightly interconnected groups of individuals and in this work, it is calculated on the basis of the work of Fagiolo [43].
The node out-strength distribution indicates how often the individuals are used to create the successful ospring. In this part, the node out-strength distribution is analyzed to nd out, whether in the case of the DE algorithms, where the solution vectors are selected randomly (DE/rand/1/bin, jDE), some individuals are used more often to create a successful trial vector than the others. On the other hand, in the networks generated on the basis of the HSDE and JADE algorithms, where the best individuals are crucial for the new ospring creation, we are investigated, whether the phenomenon "rich becomes to be richer" is occurred and when.

Experimental Results
In Tab. 1, the weighted clustering coecients of the aggregated networks generated on the basis of the four dierent variants of the DE algorithm dynamics are mentioned. In Figs. 4 a), b), c) and d), the out-strength distribution in the aggregated networks generated on the basis of the DE/rand/1/bin algorithm dynamics is depicted. Figs. 5 a), b), c) and d) present the out-strength distribution in the aggregated networks generated on the basis of the dynamics of the jDE algorithm. In Figs. 6 a), b), c) and d), the node out-strength distribution in the aggregated networks generated by the basis of the dynamics of the HSDE algorithm are depicted. And in Fig. 7  a), b), c) and d), the out-strength distribution in the aggregated networks generated on the basis of the dynamics of the JADE algorithm is shown. In Tab. 2, the results of the Shapiro-Wilk's test of normality of the out-strength distribution at the signicance level α = 0.05 are mentioned.

Discussion and Conclusion
In Tab. 1, the weighted clustering coecients of the aggregated networks generated on the basis In the case of the aggregated networks generated on the basis of the HSDE and JADE algorithms dynamics, the weighted clustering coecients are signicantly higher in comparison with the weighted clustering coecients of ag-gregated networks generated on the basis of the DE/rand/1/bin and jDE algorithms dynamics. This is given by mutation strategies used in the HSDE and JADE algorithms as well as by relatively large number of successful trial vectors generated by these DE algorithms.
When we look at the results of the weighted clustering coecients of the networks generated on the basis of the DE/rand/1/bin and jDE algorithms, we will see that except the test function f 6 , the jDE algorithm has generated networks with the signicantly higher weighted clustering coecient that the DE/rand/1/bin. Because the jDE algorithm uses the same mutation strategy as the DE/rand/1/bin, we can conclude that the principle of the control parameters adaptation used in the jDE aects the relationships between individuals such that there is higher tendency to create tightly interconnected groups. However, the comparison of the weighted clustering coecients of the aggregated networks generated on the basis of the jDE and the HSDE and JADE algorithms dynamics indicates the limited impact of the control parameters adaptation used in the jDE algorithm. The results of the weighted clustering coecients analysis have shown negative consequences of the mutation strategy rand/1 to the establishment of the relationships between individuals in the population. On the other hand, the positive impact of the mutation strategies using the best individuals, control parameters adaptation as well as combination of more mutation strategies have been conrmed.
In Figs. 47, the out-strength distributions in the aggregated networks generated on the basis of the four DE variants are presented. In Tab. 2, the results of the Shapiro-Wilk's test of normality are mentioned. The node out-strength of the aggregated networks generated on the basis of the dynamics of the DE/rand/1/bin follows normal distribution in the case of the test functions f 1 , f 6 , and f 21 . The node out-strength of the aggregated networks generated on the basis of the jDE algorithm dynamics follows normal distribution in the case of the test functions f 1 , f 5 , and f 21 . As we can see, the out-strength of the aggregated networks generated on the basis of the DE/rand/1/bin and jDE algorithm is normally distributed in three from four test functions.
On the other hand, the node out-strength of the aggregated networks generated on the basis of the dynamics of the HSDE and JADE algorithms follows normal distribution only in the case of the multimodal test function f 6 . In the case of the rest of test functions, the phenomenon "rich becomes to be richer" can be observed, which means that several individuals are dominant from the perspective of the new ospring creation. They are used to create the successful trial vectors most often, however, they are also replaced by better ospring (they are activated to move to better position). In the aggregated networks, these individuals are represented by so called hubs nodes having signicantly larger number of connections than the others. On the other hand, the node outstrength distribution in the aggregated networks generated on the basis of the HSDE and JADE algorithms to optimize the test function f 6 indicates that there are more individuals selected as the best individuals during the evolution.
Based on the results discussed above, we have concluded that the mutation strategy rand/1 and the absence of the control parameters adaptation negatively inuence the relationships between individuals in the population. We have shown that the simple principle of the control parameters adaptation can positively inuence the tights between individuals. Moreover, the results of the analysis of the aggregated networks indicate the positive inuence of the mutation strategy current-to-best/1 as well as current-topbest/1, where the best individuals are crucial for the successful trial vector creation. These individuals are responsible for the generation of the larger number of successful trial vectors.
These results have become the stepping stone for our work, where the weighted clustering coecient and the strength of nodes representing the individuals in the population are used to inuence the selection of the individuals in the mutation step.
Grant Agency of the Czech Republic -GACR P103/15/06700S, Grant of SGS No. SGS 2017/134, VSB-Technical University of Ostrava. The Ministry of Education, Youth and Sports from the National Programme of Sustainability (NPUII) project "IT4 Innovations excellence in science -LQ1602"