Parameters optimization on DHSVM model based on a genetic algorithm

Changqing YAO , Zhifeng YANG

Front. Earth Sci. ›› 2009, Vol. 3 ›› Issue (3) : 374 -380.

PDF (179KB)
Front. Earth Sci. ›› 2009, Vol. 3 ›› Issue (3) : 374 -380. DOI: 10.1007/s11707-009-0040-6
RESEARCH ARTICLE
RESEARCH ARTICLE

Parameters optimization on DHSVM model based on a genetic algorithm

Author information +
History +
PDF (179KB)

Abstract

Due to the multiplicity of factors including weather, the underlying surface and human activities, the complexity of parameter optimization for a distributed hydrological model of a watershed land surface goes far beyond the capability of traditional optimization methods. The genetic algorithm is a new attempt to find a solution to this problem. A genetic algorithm design on the Distributed-Hydrology-Soil-Vegetation model (DHSVM) parameter optimization is illustrated in this paper by defining the encoding method, designing the fitness value function, devising the genetic operators, selecting the arithmetic parameters and identifying the arithmetic termination conditions. Finally, a case study of the optimization method is implemented on the Lushi Watershed of the Yellow River Basin and achieves satisfactory results of parameter estimation. The result shows that the genetic algorithm is feasible in optimizing parameters of the DHSVM model.

Keywords

genetic algorithm / DHSVM / parameters Optimization / Yellow River Basin

Cite this article

Download citation ▾
Changqing YAO, Zhifeng YANG. Parameters optimization on DHSVM model based on a genetic algorithm. Front. Earth Sci., 2009, 3(3): 374-380 DOI:10.1007/s11707-009-0040-6

登录浏览全文

4963

注册一个新账户 忘记密码

Introduction

Since Wang first applied the Genetic Algorithm to define parameters of the watershed hydrological forecast model (Wang et al., 1991), the genetic algorithm, as a self-adaptive probability searching algorithm of global optimization in simulating genetic and evolutionary processes of plants in the natural environment, has been widely studied (Cheng et al., 2002) in optimizing parameters of the watershed hydrological model. This can mainly be explained by the fact that the algorithm is simple and can be commonly used. At the same time, it possesses high robustness and has the capability to handle non-analytic object functions and constraints. Most of the watershed hydrological models where genetic algorithm is applied to identify parameters are conceptually hydrological models (Yang et al., 2002) and ground water models (Mckinney et al., 1994). However, due to the diversity of multiple parameters and complexity of the structure, genetic algorithm is rarely used in distributed watershed surface hydrological models in terms of parameter optimization.

The Distributed-Hydrology-Soil-Vegetation Model (DHSVM) set up by Wigmosta et al (1994) is a kind of distributed surface hydrological model that has an overall consideration on energy balance and also can accurately express the interaction mechanism and mutual feedback mechanism of climate, vegetation, snow cover and hydrology (Wigmosta et al., 1994). In North America, DHSVM has been successfully applied to hydrological analysis and simulation (Wigmosta and Lettenmaier, 1999; Wang et al., 2002a; Westrick et al., 2002), interaction analysis between climate and water (Nijssen et al., 1997; Arola, et al., 1999; Wang et al., 2002b), potential impacts of climate change on water resources (Leung et al., 1999), and fundamental and applied study of forestry hydrology (Storck et al., 1998). However, the complicated structure and diversified parameters of the DHSVM make optimizing and identifying parameters difficult when traditional optimizing methods are used.

Based on the characteristics of DHSVM parameters, this paper designs a suitable genetic algorithm, and applies it to optimizing parameters for the DHSVIM of Lushi Watershed, a part of the Yellow River in China. It provides a new arithmetic instrument for parameter assessment of DHSVM.

DHSVM structure and parameter selection

DHSVM structure

DHSVM makes a dynamic description (the time step is 1-24 hour) of the spatial distribution of soil moisture, snow cover and evaporation as well as run-off process in the grid scale of a watershed DEM model. Centered on the nodes of DEM, the watershed is divided into numerous computing network grids. Geological characteristics are used to simulate the influences of watershed geology on short-wave radiation absorption, precipitation, temperature and overland flow. In each grid, it is assumed that land surface is composed of vegetation and soil. Each grid is set with soil characteristics and vegetation characteristics. In each step of computation, the model provides a solution to the simultaneous energy balance equation and mass balance equation in the watershed grids, whereas the hydrological relation between grids are set up through calculating confluence of overland flows and soil flows (Xiong and Guo, 2004). The input of the model includes near-surface meteorological data (precipitation, temperature, wind speed, air humidity, received short-wave and long-wave radiation). The output of the model is the average of the hydrological elements (evaporation, run-off flow depth, height of ground water, snowmelt, soil humidity and infiltration rate of soil water).

Canopy snow and release model is simulated by using the mass-energy balance model of a layer. Snow cover and snow melt model under canopy (or in the open air) is simulated by using the mass-energy balance model of the upper layer, taking into consideration the impacts of geology and vegetation cover on mass-and-energy exchange in the snow-cover. In the canopy evaporation process, a 2-layer model is adopted and each layer is further divided into a moist part and dry part (Wigmosta et al., 1994). Darcy’s law is applied to calculate the movement of unsaturated interflow running across the soil layer that includes multi root systems. River network consists of a series of interconnected river reaches, each of which then runs through one or more DEM grids. When overland flows and interflows move forward to the lower reaches, they might be intercepted by road grids. If ground water level in the grids is higher than the elevation of road drains, the interflow will be cut off by roads. Water flows of roadside ditches run to sewers or rivers along road drains. If a road runs across a river, water cut off by the road will move to proper river reaches and confluences along the river channel. Water from sewers (that does not run into a river) will further infiltrate downward with the movement of overland flows under the sewer system. Concomitant with ground water level being higher or lower than the river bed elevation, the cut-off of roadside ditches or river network will be stronger or weaker. Water flow rate of road drainage or river channels are simulated with tandem linear reservoirs.

Selection of optimized parameters

Parameters possessing physical meanings in the hydrological process of DHSVM can be generalized into two categories: one is constants that do not change with space; the other is parameters describing soil and vegetation that are changeable along the watershed. Once fixed, parameters of the former category will be used in the hydrological simulation process of the whole watershed. As to parameters of the latter category, each soil type and each vegetation type are designated with a parameter combination that will then be put into an association list. Via GIS technology, soil types and vegetation types in each grid of the watershed are assigned with values and associated with a corresponding parameter list.

DHSVM possess over 20 parameters of physical meanings. Sensitivity analysis (Wigmosta et al., 2002) shows that around 10 of the 20 parameters are sensitive to the simulation of watershed land surface hydrological process. Based on the experience drawn by Wigmosta from applying DHSVM to some basins, among the 10 parameters, 5 parameters, i.e. Minimum Resistance (rsmin), Lateral Conductivity (K), Exponential Decrease(f), Field Capacity(ϕ) and Porosity(θfc), exert especially higher influences on evaporation, snow melt and run-off process. The 5 parameters are also very hard to be measured quantitatively. rsmin has effects on annual water balance through its control over the evaporation process; K and f affects run-off process through control over ground water movement; ϕ and θfc have evident effects on both evaporation process and run-off process.

This paper selects rsmin, K, f, ϕ and θfc in parameter optimization, and other parameters will be decided by other ways. rsmin varies among different vegetation types. K, f, ϕ and θfc vary amongst different soil types. Thus, the number of variables in actual optimization is far more than 5. The value ranges and physical meanings of the 5 parameters are shown in Table 1.

Designing genetic algorithm for DHSVM parameter optimization

Encoding

Taking into account that the parameter assessment of DHSVM concerns with optimization of multi-dimensional function, this paper takes constraint conditions of decision variables and efficiency of operation into consideration. The encoding method of floating point number, therefore, is used to express parameters to be assessed as real vectors to conduct optimization search.

Fitness function

Genetic algorithm does not implement external information in evolutionary search. Instead, search is only based on fitness function and conducted by means of fitness value of each individual of the population. Therefore, selection of fitness function is vital, which can influence convergence speed and affect whether the best solution can be found or not. Generally, fitness function is transformed from objective function.

Structuring objective function

In parameter optimization and identification for distributed land surface hydrological models, the identification of objective function plays an important role. Whether the objective function is structured properly or not affects directly the optimization results and speed. The method of structuring objective function varies in different circumstances. As to the land surface hydrological process, the simulation involves energy balance and water balance, as well as peak time, etc. In terms of the request for multiple objectives, the solution is generally to unify multiple objectives into one based on a utility function. Due to the insufficiency of radiation data in the study area, however, only goodness-of-fit of the run-off process is taken into consideration in this study (Zhou and Sun, 2002).

Assessing the feasibility of simulation of the run-off process is normally realized by comparing the measured run-off flow rate and the estimated flow rate. The following is the adopted objective function:

f1=i=lN[Q0(i)-Qc(i)]2,
where Q0(i) and Qc(i) are measured value and estimated value, respectively.

When the water balance equation is considered, the objective function is structured as:

f2=1Ni=1N[Qo(i)-Qc(i)]2(1+|Q ¯o-Q ¯c|Q ¯o),
where Q ¯0 and Q ¯c are the averages of measured series and estimated series, respectively, and the final part is set up for water balance.

Fitness function

One of the characteristics of the genetic algorithm is that it can get related search information of the next step by using only the value of the objective function, whereas the use of the objective function value is actualized through assessing individuals’ fitness. Based on the types of optimization, if the objective function is defined, the fitness function then can be found by converting the objective function according to certain rules. Formula (2) above is for getting the minimum and thus we can convert the objective function into:

F(X)=0, if f2(X)Cmax,

where Cmax is a relatively bigger number.

Genetic operator

The genetic algorithm operates through a group of encoded feasible solutions called a population. Through update and iteration of the population, the genetic algorithm finds the global optimum solution, where population update and iteration are realized through a selective operator, crossover operator and mutation operator (Yang and Shen, 2005).

Selective operator

According to the Law of Survival of the Fittest with individuals in the population, individuals with higher fitness have a higher probability of being inherited to the next generation and vice versa. This is aimed to improve higher probability of survival for those individuals with high performance and thus increase global convergence and computation efficiency. At present, methods commonly used include proportionate selection, new elitist strategy, deterministic sampling, stochastic sampling with replacement, remainder stochastic sampling with replacement, ranking selection, and stochastic tournament selection. In this study, proportionate selection is adopted.

Proportionate selection is a method of stochastic sampling without replacement. It is assumed that M represents the fitness of the No. i individual and Fi is the fitness of individual i. Thus, the probability that individual i is selected is pis:

pis=Fi/i=1MFi (i=1,2,M),

It can be seen that individuals with higher fitness enjoy a higher probability of being selected and vice versa.

Crossover operator

The crossover operator is where 2 matched chromosomes exchange part of their genes based on some rule and thus form 2 new individuals. Crossover operation is what makes genetic algorithm differ from other evolutionary algorithms essentially. As a main channel to create new individuals, crossover operation plays an important role in genetic algorithm. The most commonly used crossover operator is the single point crossover operator. However, application of the single point crossover operator should be subject to proper circumstances. Therefore, scholars have developed some other crossover operators, such as the two point crossover, uniform crossover and arithmetic crossover. In this study, arithmetic crossover is adopted.

Arithmetic crossover is defined as the linear combination of 2 vectors: if Slv and Slw are selected in crossover, their offspring will be: Sl+1v=aSlw +(1-a) Slv and Sl+1w=aSlv +(1-a) Slw.

Mutation operator

Mutation operation is where at some locus the gene value of an individual chromosome code string is replaced by another gene value at the same locus and thus forms a new individual. The aim of mutation operation is to improve the local search capability of the genetic algorithm, maintain the diversity of the population and prevent premature phenomenon. Operation methods normally used include simple mutation, uniform mutation, boundary mutation, non-uniform mutation and Gaussian mutation. Boundary mutation is adopted in this study.

Boundary mutation operates a transformed operator of uniform mutation. When operating the boundary mutation from X=x1x2xkxl to X=x1x2xkxl, if the gene value at mutation point xk ranges [Ukmin,Ukmax], the new gene value xk will be defined as follows:

xk={Umink, if random(0,1)=0Umaxk if random(0,1)=1,
where random(0,1) represents either 0 or 1, having equal probability of being selected, will be selected.

Operation parameters

Operation parameters that need to be selected in genetic algorithm are population size (M), probability of crossover (pc), probability of mutation (pm) and generation gap (G), etc. These parameters have a big influence on the operation performance of the genetic algorithm.

Termination conditions of algorithm

In actual application of the genetic algorithm, it is impossible to conduct searches endlessly. At the same time, the optimum solution to the problems is normally unknown. Therefore, it is necessary to design some rules to terminate the algorithm. Termination conditions are mainly of 3 types: the first is to preset maximum iterative algebra (T). When the designated evolutionary algebra is reached, the genetic algorithm will stop operation and output the best individual in the present population as the optimum solution. Usually, the recommended value ranges 100-1000. The second is to cease operation if the difference of the average fitness of individuals after certain generations is lower than a very small threshold when the fitness cannot be improved further. The third is to terminate operation when fitness of individuals reach preset requests.

Programming

Programming the genetic algorithm to find the optimum solution is a very important and difficult job. In certain application development, the optimum process can use proper software packages to find the solution. After the communication and interface between the simulation model and the software package is designed, the corresponding objective function is structured and operation parameters of the algorithm are selected. In terms of numerical function optimization, the reputable system is Genocop. After updating and revisions, the latest version is Genocop III, which has improved the capability of Genocop from only dealing with linear constraints to dealing with any non-linear constraints at present (Michalewicz, 2000). Based on C source code of Genocop III, this paper designs and develops a genetic algorithm program for parameter optimization of the watershed land surface hydrological model and realizes the integration of this program with DHSVM.

The software package of Genocop III is developed under the UNIX system and there might be problems of compatibility when compiling and operating under Windows. As such, the first step of program design is to rewrite and encapsulate the source code of Genocop III. The DHSVM is integrated in ArcMap platform (Yao et al., 2006). Therefore, the toolbox of the genetic algorithm must also be integrated in ArcMap. On the other side, communication between input and output objects of DHSVM base must be realized in the optimization process of the genetic algorithm. To ensure the realization of the above-mentioned two aspects, the author adopts COM development technology to rewrite and encapsulate Genocop III and complete system integration under the platform of Visual C++ 6.0, which has enabled the creation of a GAToolbox for genetic algorithm. Figure 1 shows the design route of GAToolbox, including three stages: Genocop III rewrite and encapsulation, communication between GACOM and DHSVMObjects, interface development and system integration.

Empirical study

Study area profile

The watershed area of the upper reaches of Luo River above the Lushi Hydrological Station is 4623 km2 and the total length of the river course is 196.3 km. The study area is located in transitional zones between the subtropics and warm temperature. Annual precipitation average is 720 mm and annual water surface evaporation is 966 mm. As to spatial distribution of precipitation, the northwest of the watershed has the relatively higher precipitation, whereas the south has the lower precipitation. Precipitation varies in different months of a year, but concentrates from July to September of each year when 55%-65% of annual precipitation occurs. Spring is second in terms of precipitation concentration and winter witnesses the slightest precipitation, only 3.4% of a year. Under the influence of the Changjiang-Huaihe shear line, rainstorm often occurs in summer and autumn. Soil in the watershed is mainly composed of brown soil and cinnamonic soil. Most areas of the watershed are within the State Forestry Protection Area, and thus the status of natural vegetation is relatively good. Vegetation is mainly comprised of natural secondary spinney forest that includes birches, roburs, poplars, pines, cypresses, basses, tungs and catalpas. Forest coverage is 34.2% and shade density ranges 0.5-0.7. The watershed has a diversity of herbaceous plants and the coverage is of medium level. The watershed is dominant by crop farming and livestock breeding and development of industry lags behind. In the study area the number of dammed-up projects is small and some irrigated farmlands can be seen in the lower reaches along the riverside.

Testing

Assessment on optimization results and parameter fitness is conducted by using 2 indices: MBR, which represents the ratio of total measured flow rate and total calculated flow rate, and Eff, which represents the efficiency coefficient of Nash-Sutcliffe Model.

MBR

MBR can demonstrate the overall balance status of model simulation, which is the ratio of total simulated flow rate and total measured flow rate.

MBR=i=1NQsi/i=1NQoi,
where Qsi and Qoi are respectively the simulated flow rate and the measured flow rate within No.i time step.

Eff

Coefficient of Nash-Sutcliffe is used to evaluate the fitness between simulated value and observed value, which can be expressed as the following:

Eff=1-i=1n(Qs-Qo)2i=1n(Qs-Qo ¯)2

where Qo is observed value; Qs is simulated value; Qo ¯ is measured value average; n is times of measurement. If Eff is negative, it shows that simulated value is less representative than the arithmetic mean of the observed value.

Results and discussion

Parameter optimization for the watershed DHSVM of the upper reaches above Lushi Hydrological Station of the Luo River is examined using genetic algorithm. In order to understand the effects of operation parameters of the genetic algorithm and get good optimization results, this paper sets up 4 parameter combinations of genetic algorithm to conduct the optimization (shown in Table 2).

This paper makes a calibration of the model parameter on daily observed run-off data of the Lushi Hydrological Station in 1990. Figure 2 shows the changes of objective function value of optimum individuals within each generation with the times of iteration under the four combinations mentioned above. The results show genetic algorithm can guarantee the stable convergence of objective function, no matter what combination is used. As to objective function value, No. 4 combination has the smallest convergence, which means it can meet the request of optimization better. However, the common ground of the four curves is that the value of objective function changes at a low speed with low efficiency when the evolution reaches No. 200 generation. In real application, a local search can be conducted with the results being the initial point once No. 200 generation is reached, which can lead to better optimization results at a higher speed and improve the efficiency of optimization.

Table 3 shows the results of the DHSVM parameter optimization and the calculation efficiency under the 4 combinations. It can be drawn from the table that the 4th combination has the highest efficiency. Therefore, the optimized parameters based on the 4th combination are selected to simulate the hydrological land surface process of the Lushi Watershed.

Figure 3 compares the average measured run-off monthly with the average simulated run-off monthly from 1980 to 1990, 11 years in total. Simulated run-off is conducted based on two schemes, i.e., using the initial parameter value and using the optimized parameter value. It shows that simulation results after parameter optimization is much better than the results drawn from the initial value. The calculated run-off monthly generally agrees with the measured run-off except for some errors that appear in several months when flood peak occurs. This indicates that DHSVM has fitness and basically can represent the monthly run-off of the hydrological station. Many reasons can explain the errors, and include: 1) the uncertainty of the model. Compared with the traditional hydrological model DHSVM considers more physical processes, and the complexity of the model can bring more uncertainty during parameter optimization. 2) Measured flow rate series after calibration are not drawn from natural run-off process and include some human activities such as priming, irrigation and reservoirs while effects of human activities are not considered in the present DHSVM model. 3) The precision of data input is also an important factor that influences the simulation results, especially precipitation data. The precision of overall precipitation data can exert a direct impact on whether the simulated results can meet the requirements of water balance. Spatial distribution of precipitation can influence the run-off process and structure.

Conclusions

Applying the genetic algorithm to parameter optimization for a distributed land surface hydrological model is currently an exploratory job. In the perspective of defining the encoding method, designing the adaptation function, devising the genetic operators, selecting the algorithm parameters, and identifying the termination conditions of the algorithm, this paper designs a genetic algorithm for parameter optimization of DHSVM, and then conducts an empirical study on the Lushi Watershed of the Yellow River. It is concluded that: the designed genetic algorithm can draw satisfactory parameter values within proper time; the optimization performance of the designed genetic algorithm is higher than that of the traditional algorithm; the calculated value of the parameters is reasonable, which can prove that the genetic algorithm is effective in parameter optimization for DHSVM and agrees with actual utilities.

References

[1]

ArolaA,LettenmaierD P(1999). Effects of subgrid spatial heterogeneity on GCM-scale land surface energy and moisture fluxes. Water Resources Research, 35(1): 255-264

[2]

ChengC T,OUBC P,ChauK W(2002). Combining a fuzzy optimal model with a genetic algorithm to solve multi-objective rainfall-runoff model calibration. Journal of hydrology, 268: 72-86

[3]

LeungL R,WigmostaM S(1999). Potential climate change impacts on mountain watersheds in the Pacific Northwest. Journal of American Water Resource Association, 35: 1463-1471

[4]

MckinneyD C,MinD L(1994). Genetic algorithm solution of groundwater management models. Water Resources Research, 30(6): 1897-1906

[5]

MichalewiczZ(2000). Evolved procedure-genetic algorithm and data coding union. Beijing: Scientific press. (in Chinese)

[6]

NijssenB,HaddelandI,LettenmaierD P(1997). Point evaluation of a surface hydrology model for BOREAS. Journal of geophysical research, 102: 29367-29378

[7]

StorckP,BowlingL,WetherbeeP,LettenmaierD(1998). Application of a GIS-based distributed hydrology model for prediction of forest harvest effects on peak stream flow in the Pacific Northwest. Hydrologic Processes, 12: 889-904

[8]

WangQ J(1991). The genetic algorithm and its application to calibrating conceptual rainfall-runoff models. Water Resources Research, 27(9): 2467-2471

[9]

WangS R,HuangR H,DingY H(2002a). Improvements of adistributed hydrology model DHSVM and its climatological-Hydrological off-line simulation experiments. ACTA Meteorologica sinica, 60(3): 290-300 (in Chinese)

[10]

WangS R,HuangR H,DingY H(2002b). Numerical simulation experiments by nesting Hydrology model DHSVM with regional climate model regcm2/China. ACTA Meteorologica sinica, 60(4): 421-427 (in Chinese)

[11]

Westrick,K.J.,StorckP,MassC F(2002). Description and evaluation of a hydro meteorological forecast system for mountainous watersheds. Weather and Forecasting, 17: 250-262

[12]

WigmostaM S,LettenmaierD P(1999). A comparison of simplified methods for routing topographically-driven subsurface flow. Water Resources Research, 35: 255-264

[13]

WigmostaM S,PerkinsW(2002). User's Guide for the Distributed Hydrology-Soil-Vegetation Model. Meteorology. Washington: Pacific Northwest National Laboratory

[14]

WigmostaM S,VailL,LettenmaierD P(1994). A distributed hydrology-vegetation model for complex terrain. Water Resources Research, 30(6): 1665-1679

[15]

XiongL H,GuoS L(2004). Distributed Watershed Hydrological Model. Beijing: China Water Power Press (in Chinese)

[16]

YangX H,LuG H,LiJ Q(2002). Applications of the hybrid accelerating generic algorithm to parameter optimization of basin model. Advances in Water Science, 13: 340-344 (in Chinese)

[17]

YangX H,ShenZ Y(2005). Intelligent Algorithms and Their Applications in Resources and Environmental System Modeling. Beijing: Beijing Normal University press (in Chinese)

[18]

YaoC Q,YangZ F,ZhaoY W(2006). Study on integration of distributed hydrology soil vegetation model and GIS. Journal of Soil and Water Conservation, 20(1):168-171 (in Chinese)

[19]

ZhouM,SunS D(2002). Genetic algorithm principle and application. Beijing: National defense industry press (in Chinese)

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag Berlin Heidelberg

AI Summary AI Mindmap
PDF (179KB)

1253

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/