1 Introduction
Multicast services allow one host to send information to a large number of receivers, without being constrained by its network interface bandwidth, which makes applications more scalable and maximizes the use of system resources. The early multicast protocols need the support of network layer equipments and are called network layer multicast (NLM) also known as IP multicast (IPM). However, the lack of network layer support for multicast makes it necessary to obtain multicast service at a higher level. To that end, application layer multicast (ALM) was proposed
12ALM systems are built on top of a general Internet unicast infrastructure. The Internet is a connected network. Theoretically, there always exist one or several unicast paths between any two hosts. In ALM systems, hosts self-organize into a logical connected control topology called mesh. Each host maintains a list of other hosts that are called its mesh neighbors. It periodically communicates with its mesh neighbors to maintain the topology's connectivity. Then one tree or one set of trees, which are ALM data topologies, should be embedded in the mesh by a routing algorithm. Multicast data is transferred along the branches of the tree. One host's mesh neighbors are also the candidates of its children in the multicast tree.
The purpose of multicast (either ALM or IPM) routing algorithms
3&8211;114567891011 is to find multicast trees fitting some constraints. There are two kinds of performance indexes for multicast trees: cost and load balancing. The cost constraints include the tree's total cost constraints, the tree's diameter constraint, the constraint of average delay experienced by all destinations and so forth. The load balancing constraints aim to prevent part of the hosts and network link from failing as a result of too much load.
Though the indexes and the general algorithms may be the same, some differences do exist between NLM and ALM. Since ALM tree's nodes are hosts, the ‘load balance' can be defined as ‘application layer load balance', though application layer is not the only one that may suffer from load imbalance. Network layer performances do not receive much attention in current ALM protocols. The ALM tree may use the physical link between the same pair of routers more than once, which means that one physical link may transit the same multicast data repeatedly for several times. Such links may become the bottleneck of the multicast session. For a network layer link, the index stress is defined as the number of identical copies of a packet traversing the link for a given tree. Almost no ALM protocols are focused on the optimization of stress metric. This problem can be summarized as ‘network layer load balance' in ALM routing.
Many overlay multicast solutions use simple geographical rules
126 to build an acceptable tree instead of trying to find an optimal tree. Heuristics based on genetic algorithms (GA) have been applied to solve the routing problem in communication networks for a long period. In Ref.
3 GAs are used to build IP multicast tree by optimizing the tree cost. In Ref.
5 a genetic overlay routing solution based on a complete graph is proposed to optimize the cost and degree balance performance. References
7&8211;108910 use GA to solve the NLM routing problem regarding delay, bandwidth and cost constrains.
In this paper, ALM routing problem is described as a multi-objective optimizing problem. The trees' cost and load balance performance are analyzed in detail. A novel ALM routing algorithm based on GA is proposed to achieve a source tree according to cost, application layer load balance and network layer load balance constraints.
2 Network model and problem description
2.1 Network model
ALM end hosts self-organize into a connected mesh which can be depicted as a directed graph M(H,P,c). The mesh consists of a nonempty set H of |H| hosts and a set P, P ⊆ H × H of |P| unicast paths connecting pairs of hosts. Each unicast path is associated with a cost function, c:P → R+. These paths are element paths that are candidates to build the multicast tree. H = s ∪ D, s acts as the data source of the group and other hosts belonging to D are the receivers. The mesh is a complete graph when all pairs of hosts are connected by element paths. The lower network topology includes the set N with |N| routers, the set E, which is composed of |E| edges connecting routers, E = {ei|i ∈ N,i ≤ |E|}, E ⊆ N × N and the set L, consisting of the links connecting hosts and routers, |L| = |H|, L = {lj|j ∈ N,j ≤ |L|}. The cost of an element path is the sum of the links' and the cost of the edges along the path. A source tree or a shared tree embedded in the mesh spanning all the hosts is built by routing algorithm.
2.2 Problem description
Given mesh M, the overlay multicast routing algorithm aims at finding a spanning tree T(H,F), T ⊆ M, F ⊆ P, subject to the following constraints:
Cost constraint: a tree's cost is the sum of all element paths' cost used in the tree. The objective of cost optimization routing is to find a spanning tree T(H,F) of a given mesh M, whose cost should be the minimum among all of M's spanning trees.
Balance constraint: for a host, the number of its children in the overlay multicast tree should be less than an upper bound. Host h uses dmax(h) and dT(h) to denote the degree bound and its actual degree in tree T respectively. As observed, the variable ratdT(h) = dT(h)/dmax(h), the routing problem under the balance constraint is as follows:
Find a spanning tree T(H,F) of the given mesh M, subject to
ratdT(h) ≤ 1, ∀h ∈ H.
the value of maxh ratdT(h) is the maximum among all of M's spanning trees.
Network layer load distribution constraint (stress constraint): the constraint tries to limit the maximum times for one edge to transit the same data. Analogous to the balance constraint, smax(e) and sT(e) are defined as the stress upper bound and the actual stress in tree T of the edge e. The routing problem under the balance constraint is as follows:
Find a spanning tree T(H,F) of the given mesh M, subject to
ratsT(e) ≤ 1, ∀e ∈ E.
The value of maxe ratsT(e) is the maximum among all of M's spanning trees, where ratsT(e) = sT(e)/smax(e).
3 Proposed algorithm
Mesh setup: given H, R, L, E, c, s, choose element paths to achieve a connected mesh, then calculate each element path's cost and the stress for each edge introduced by it.
Routing tables: for a certain receiver, find all the possible overlay routes from the source to it and sort them by cost. Select the R routes with the least cost to build the routing table for this receiver. One possible route may be an element path or a sequence of several element paths. Only overlay routes in routing tables may be used to transfer multicast data from the source to receivers.
Initialization of chromosomes: for destination set D = {h1,h2,...,hk}, k = |H| - 1, use a string of integers of length k to represent a chromosome. The value of the ith gene gi represents the position of the route chosen in the routing table from source s to hi. The initial procedure generates P different chromosomes randomly as the first generation.
Fitness evaluation: use the following three functions to evaluate the cost, balance and stress performance of a chromosome respectively:
Define overall fitness function as f = λ1f1 + λ2f2 + λ3f3.
Reproduction: the reproduction procedure consists of three phases: selection, crossover and mutation, discard the duplicate chromosomes.
The chromosomes are sorted according to their fitness value. Half of them with smaller fitness value die off and the other half survive. The new generation comprises two parts. The first part is a copy of the surviving chromosomes in the father's generation and the second part is generated by the first part through a single-point crossover and mutation operation.
Two chromosomes with the largest fitness values are chosen from the surviving chromosomes. Randomly select the breakpoint to create two new chromosomes and put them into the second part of the new generation. Repeat the operation. If there is one chromosome left, copy it into the new generation. The mutation probability for a certain gene in the second part is p and the new value of the gene after mutation is chosen from 1 to R randomly. If there are duplicated chromosomes in the new generation, generate new random chromosomes to replace the redundant ones.
End conditions: the maximum of the generations is achieved.
4 Simulations and comparisons
The simulation network topology generated by GT-itm
12 is distributed on a 100 × 100 plant. There are 8 transit routers, 32 stub routers. An edge's cost is the distance between the pair of routers, all links' cost are set as 1. The degree of the upper bound is set as 4 for all hosts. The upper bound of stress is set as 10 for the edges connecting two transit routers, 4 for those connecting two stub routers, 8 for those connecting one transit router and one stub router. The population of genetic algorithm is 10. During this period, there are totally 20 hosts distributed on the network topology. Each host is connected to a stub router via a link. The unicast path between two hosts is chosen as element path with a probability of 0.40. There are in total 126 element paths in the mesh.
4.1 Effect of R and p
Set host 1 as multicast source. Set R as 5, 10, 15, 20 respectively. p increases from 0.1 to 0.9. For each (R, p) pair, run the proposed algorithm 10 times, the generation limit is 500. As shown in
Fig. 1, when p changes, the variation of the output fitness values mostly results from the cost fitness function. With the same p, the output fitness value decreases when R increases. With the same R, the output fitness value tends to decrease when p increases.
As shown in
Figs. 2 and
3, with the same R, when p = 0.3 the output fitness values are larger than those when p = 0.5. But the evolution process converges more slowly than the later ones.
The three values of the fitness function of the source tree rooted on host 1 by compass routing (CR) is respectively 0.834, 1 and 0 while the values by ALMI are 0.7940, 0.600 and 0.750. Set R as 5, p increases from 0.1 to 0.9, run the presented algorithm 20 times for each p.
The 180 output trees' balance fitness and stress fitness are all 1. There are only 5 trees whose cost fitness value is less than that of the CR tree. 90% confidence intervals and average value of fitness functions are shown in
Fig. 4.
4.2 Comparison of ALMR-GA algorithm with other ALM routing algorithms
According to the above discussion, the scale of routing tables R is set as 5 and p is set as 0.2 in this step. For each host, run the proposed algorithm 10 times.
Table 1 shows the comparison of the fitness value of the trees by proposed algorithm and those by the CR and ALMI.
Among the total 200 simulations, only 4 output trees' load balancing fitness values are not one, and only 16 trees' stress fitness value are not one. For the same source, all output trees' fitness values are larger than those of CR tree. The average values of the load balancing and the stress fitness of all 200 output trees are respectively 0.9990 and 0.9250, while those of CR trees are respectively 0.7525 and 0, which means that no CR trees satisfy the stress constraint. The average cost fitness value of CR trees is 0.8473, and the value of output trees are 0.8826, i.e., the average cost optimization is 23%.
5 Conclusions
ALM routing is multi-objective optimization. Most existing protocols are based on simple geographical rules and are focused on either tree cost or application load balance performance. Little attention is paid to lower level performance.
In this paper three functions are studied to evaluate cost, application layer load balance and network layer load balance performance of ALM trees respectively. An overall fitness function is generated and a novel routing algorithm, ALMR-GA, is proposed according to the function. Numerical simulation results show that the trees generated by ALMR-GA with above constraints have more optimized cost than those by CR and ALMI in the same topology. The former ones satisfy both the application layer and the network layer load balance constraints better than the latter ones.
ALMR-GA may be used to solve ALM routing problems for different purposes with different emphases by adjusting fitness functions and their weights.
Higher Education Press and Springer-Verlag Berlin Heidelberg