Integrated uncertain models for runoff forecasting and crop planting structure optimization of the Shiyang River Basin, north-west China

To improve the accuracy of runoff forecasting, an uncertain multiple linear regression (UMLR) model is presented in this study. The proposed model avoids the transfer of random error generated in the independent variable to the dependent variable, as this affects prediction accuracy. On this basis, an inexact two-stage stochastic programming (ITSP) model is used for crop planting structure optimization (CPSO) with the inputs that are interval ﬂ ow values under different probabilities obtained from the UMLR model. The developed system, in which the UMLR model for runoff forecasting and the ITSP model for crop planting structure optimization are integrated, is applied to a real case study. The aim of the developed system is to optimize crops planting area with limited available water resources base on the downstream runoff forecasting in order to obtain the maximum system bene ﬁ t in the future. The solution obtained can demonstrate the feasibility and suitability of the developed system, and help decision makers to identify reasonable crop planting structure under multiple uncertainties.


Introduction
Increase in water demand and reduction in available water supply are in direct competition, which exacerbates the shortage of agricultural water resources [1] and this problem is particularly severe in arid and semi-arid areas. To make full use of limited water resources for agricultural production, crop planting structure optimization is an important approach to increase agricultural economic benefits and improve agricultural water management [2][3][4] .
CPSO is a complex system with many uncertain parameters, such as crop planting area, irrigation water use efficiency, available water supply and economic parameters. The comprehensive benefits of the system are influenced by all those parameters and phenomena due to their uncertainty characteristics [5,6] . If defined parameters or models are employed simply instead of uncertain ones, unreliable results will be obtained with important information missing [7,8] . Among all the uncertain parameters, surface water is quite significant and will directly affect the optimal scheme of crop planting structure adjustment. The main source of the available surface water is catchment runoff. Hence, for accurately optimizing crop planting structure in the future, the accurate prediction of catchment runoff is desirable. Much research has focused on hydrological forecasting methods in order to obtain more accurate predictions for runoff [9][10][11] . Of the methods evaluated, a multiple linear regression (MLR) model has proven to be an effective forecast method [5,12] . In the process of prediction, the random error generated in the prediction of the independent variable can transfer to the dependent variable, which seriously influences the prediction accuracy. In addition, climate change and human activities exacerbate the complex uncertainties in runoff forecasting. To date, few studies have attempted to overcome these disadvantages. Consequently, improvements in the uncertain MLR method are still needed.
Inexact two-stage stochastic programming (ITSP) modes are capable of dealing with the interval parameter with the lower and upper bounds combining with the stochastic process [13,14] . ITSP models have been widely applied to CPSO owing to the stochastic characteristics in available water recourses. For example, ITSP was used to deal with not only the stochastic nature of different water availabilities in different hydrological years, but also the interval uncertainty that emerges in statistics of crop prices and costs [15] . However, previous studies have not been able  to integrate runoff forecasting under uncertainty with ITSP,  and thus cannot provide dynamic decision suggestions for  crop planting structure adjustment. Therefore, this study aimed to improve the MLR model to avoid poor accuracy of the forecast results, and optimize the crop planting structure based on the results of an improved uncertain multiple linear regression (UMLR) model. The improved model was then applied to a case study to evaluate how the system can be used to adjust crop planting structure more efficiently. The study encompassed: (1) the formulation of an UMLR model based on MLR model for runoff forecasting, (2) the development of an ITSP model for CPSO model integrating the flow values under different probabilities, which are the results from the UMLR model, (3) the application of the improved model to a real case study in the Shiyang River Basin, north-west China, and (4) the analysis the results under multiple scenarios to provide recommendations for crop planting structure adjustment.

Study area
The study area was the Shiyang River Basin (101°41′-104°16′ E, 36°29′-39°27′ N) (Fig. 1), which is one of three continental rivers in the Hexi corridor of Gansu Province, north-west China. The Shiyang River Basin has a typical arid continental climate, characterized by low and irregular rainfall, high evaporation and drought periods. It has the most serious water shortage in the Hexi Corridor [16][17][18] . The Shiyang River originates in the Qilian Mountain and consists of eight tributaries, Dajing, Gulang, Huangyang, Zamu, Jinta, Xiying, Dongda and Xida rivers (east to west). All rivers except the Dajing and converge at the Shiyang River and then flow into the Minqin Basin [19] . Minqin County is located in the lower reaches of the Shiyang River Basin, with an area of 41400 km 2 , which is one of the most important agricultural counties of Wuwei. Through the Caiqi Hydrologic Station, the Shiyang River flows into the Hongya Mountain Reservoir, the largest desert reservoir in Asia. The available water resources in Minqin County are mainly supplied by surface water from the Caiqi Station, groundwater and diverted water. About 70% to 80% of the water consumption is used for agricultural irrigation, which is the largest water user of the whole system. Due to climatic change, population growth, agricultural and economic development stream flow into Minqin County has decreased sharply in recent years. The annual inflow through the Caiqi Station dropped from about 5.78 Â 10 8 m 3 in 1955 to less than 1.00 Â 10 8 m 3 in 2004.
Poor management practices and unsuitable crop land use make the conflict between water supply and water demand more intense. To guarantee sustainable development of the region, the regional development planning requires that water allocation must fulfill domestic and basic ecological water demand first, then industrial water, which means adjustment of agricultural water use is imperative. Therefore, appropriate agricultural planning within the available water resources, taking into account changing stream flows, is crucial not only for agricultural production but also socioeconomic development and ecological restoration in the Minqin Basin. It is important to accurately predict runoff from the Shiyang River into the Minqin Basin, as a guide for the adjustment of crop planting structure.

Data collection
Meteorological data for the Wuwei Sub-basin, streamflow Fig. 1 Location of Shiyang River Basin in north-west China data for the Shiyang River Basin, and some basic regional data for the Wuwei Sub-basin were used in this study. Meteorological data for the Wuwei Sub-basin including annual mean temperature and annual mean precipitation data, were obtained from the China Meteorological Data Sharing Service System. Flow data for the six tributaries obtained from their respective observation stations, and inflow data for Minqin Country were from the Caiqi Station. Population of Wuwei Sub-basin, effective irrigated area and the value of agricultural output, were obtained from Statistical Yearbook of Wuwei City [20] . All the data above were for the period 1955 to 2014.
Moreover, to predict the appropriate crop planting structure for Minqin Basin under dynamic inflow, the data used in the optimization model were as follows. Area data and crop output per unit area for 2009 to 2014 were from Statistical Yearbook of Wuwei City [20] . The demand for food and vegetables from Dietary Guidelines for Chinese [21] . Groundwater and water diversion data were collected from the Shiyang River Basin Key Governance Projects [22] .

System framework
This paper attempts to combine the optimization model for crop planting structure with runoff forecasting, where the runoff forecasting uses the improved UMLR model to enhance the accuracy of prediction results, and the optimization part applies the ITSP model to optimize crop planting structure. The common point between runoff forecasting and optimization is the available surface water of Minqin Country. A brief description is given in Fig. 2.

UMLR model
The MLR model, one of the earliest models used in medium and long-term hydrological forecasting, is still widely applied to related research [12,23] . The MLR model is usually given mathematically as: where the dependent variable R represents the reconstructed/forecasting annual downstream flow; the independent variables X 1 ,X 2 ,:::,X n denote the factors that affect runoff; The constants β 1 ,β 2 ,:::,β n are the regression coefficients obtained by the least-squares method; and parameter e is the random error with a normal distribution. However, when the MLR model is used to predict runoff, the values of multiple independent variables are difficult to predict accurately. Consequently, the propagation of random errors will be passed to the predicted value through the MLR model, which will obviously reduce prediction accuracy. To improve the prediction accuracy of runoff forecasting and make full use of uncertain information in independent variables, an improved UMLR model was developed in this study. This UMLR model decomposes the independent variables in the MLR model into the sum of the change trend and the random error. It is described as where x n is the trend of the independent variable; and α n is the random error of the true value deviates from the trend Finally, UMLR model is formulated as: where parameter is the sum of random errors from the MLR model and decomposition with a normal distribution.

Runoff prediction
According to previous studies [5,24,25] , the change in runoff from upstream and downstream of the Shiyang River Basin occurred in about 1975. Therefore, the runoff series of the Shiyang River Basin from 1975 to 2014 was selected to ensure a stable sequence. To ensure the regression model effectiveness, the independent variables should be chosen carefully. The criteria for the selection of independent variables are: (1) the independent variables should have a significant effect on the dependent variable with a close linear correlation; (2) the linear correlation between the independent variable and the dependent variable must be authentic not just formal; (3) there should be a certain degree of mutual exclusion between the independent variables, that is, the degree of correlation among the independent variables should not be higher than the degree of correlation with the dependent variable; and (4) the independent variables should have complete statistical data with predictions that are easily to determined. The correlation coefficients of variables that may affect the inflow in the Caiqi Station are shown in Table 1. In accordance with the criteria for selection of independent variables, the total flow of six tributaries at mountain outlets, the population of Wuwei Sub-basin, and annual mean temperature were selected as the independent variables of the MLR model to simulate and predict the inflow in the Caiqi Station. The regression equation is: where R represents the inflow in the Caiqi Station; U is the total flow of six tributaries at mountain outlets; P denotes the population of Wuwei Sub-basin; T means the annual mean temperature of Wuwei Sub-basin; e is the random residuals with normal distribution.
To assess the accuracy of the forecasts, deterministic coefficient (DC) and qualified rate (QR) were used [26] . The values of these two indicators were DC = 0.91 and QR = 75.0%, respectively, demonstrating that the precision of the forecast is relatively reliable in accordance with the normal standards of the People's Republic of China, Standard for Hydrological Information and Hydrological Forecasting [21] for accuracy of runoff forecasting.
The linear model was used to simulate the total flow of six of the tributaries at their mountain outlets and the annual mean temperature of Wuwei Sub-basin from 1955 to 2014, and to forecast trends in these variables from 2015 to 2020 (Fig. 3). Figure 3 shows that the annual total flow of six tributaries at their mountain outlets showed an obviously declining trend. In contrast, the annual mean temperature had a rising trend. Through the linear model, the actual value between 1955 and 2014 was partitioned into the trend and the random residual.
The exponential model was used to forecast the natural growth rate for the population of Wuwei Sub-basin in Shiyang River Basin north-west China from 2015 to 2020. The trend and the random residual are shown in Fig. 4. Using the above values in the UMLR model, the trend of inflow in the Caiqi Station in the period 1975 to 2020 was calculated. At the same time, random error was fitted by normal distribution (Fig. 5). The UMLR model avoids the metastasis of the random error from the independent variable prediction to the inflow in the Caiqi Station prediction, and provide not only possible intervals but also the probability of each interval. To obtain the flow values under different probabilities at the Caiqi Station from 2015 to 2020, the fitted normal distribution was divided into three discrete probability intervals. The Pauta criterion, a method distinguishing rough error by judging the calculated probability of interval beyond the interval of 3, was used to obtain probability interval boundaries. Then, the normal distribution was divided at probabilities of 75% (low flow level), 50% (medium flow level), and 25% (high flow level), and three probability intervals of inflow in the Caiqi Station from 2015 to 2020 were obtained (Fig. 6). Figure 6 shows that the inflow in the Caiqi Station is predicted to have a downward trend from 2015 to 2020. This indicates the necessity to optimize crop planting structure because water available to agriculture will reduce in the future. This forecast estimates the f surface water available in the future for CPSO.

ITSP model
Two-stage stochastic programming (TSP) refers to a  tradeoff between predefined strategies and the associated adaptive adjustments [27] . The TSP model can correct the decision in the first stage through making the decision in the second stage after the random event happened in order to reduce the decision risk and obtain the best compromise schemes [28,29] . In addition, owing to the uncertainties, most real-world problems may lead to the introduction of ambiguous inexact parameters into the TSP framework to result in an ITSP model [13,30] .
ITSP was used to solve many practical problems [31][32][33] . The ITSP model for crop planting structural optimization with a maximized planting benefit objective can be mathematically expressed as [15] : where the subscripts i = 1, 2, …, I are the types of crops and I is the total number of crop types, I = 5 with i = 1, 2, 3 representing grain crops and i = 4, 5 representing economic crops; k = 1, 2, …, K represents various scenarios of available water levels and K is the total number of scenarios of available water level, with k = 1 representing the low flow level, k = 2 representing the medium flow level and k = 3 representing the high flow level. f (in a currency unit, CNY) denotes the desired value of planting benefit; R i (CNY$hm -2 ) is the net income per unit area of crop I; p k is the probability of occurrence of level k; L i (CNY$hm -2 ) is the loss of crop i in the absence of sufficient water, and L i > R i ; the decision variable A i and A # ik (hm 2 ) are the first stage decision crop area and second stage decision crop area, respectively; and the superscripts "+" and " -" are the upper and lower bounds of the corresponding parameters, respectively (e.g., the upper and lower bounds of the decision variable A AE i can be expressed as A þ i and Ai ). Subject to: (regional crop structure planning constraints) (food security constraints) (non-negativity constraints) where h is the irrigation water utilization efficiency; Q AE k (10 8 m 3 ) represents the total amount of available water of selected crops in scene k, including surface water, ground water and water transfer from other regions; M i (m 3 $hm -2 ) is the irrigation quota of crop I; a and b are the ratio of grain crops planting area and economic crop area, which should satisfy the general plan of the region; Y i (kg$hm -2 ) is the yield of crop I; and D i (kg) is the minimum demand of crop i in this region.
Since the objective is to maximize the system benefits, the sub-model corresponding to f + should be calculated first. According to the related researches [28,33,34] , this model (Eqs. (6a)-(6e))can be solved by decomposing it into two sub-model as follows, The upper bound value of objective function f + : subject to: where z i and A íik are decision variables. z i opt , A íik opt , and f þ opt are the solutions of the sub-model (Eqs. (7a)-7(f)). The optimized first-stage variable can be obtained by which may correspond to optimized upper bound objective function value.
The lower bound value of objective function f -: subject to: where A íþ ik is the decision variable. A íþ ik opt and fopt are solutions of the sub-model (Eqs. (8a)-8(e)). Thus, optimal solutions of model (Eqs. (6a)-6(e)) can be obtained as: Optimized area of selected crop can be give as:

Analysis and discussion
The inflow in the Hongya Mountain Reservoir includes the water upstream of the Shiyang River and diverted water from other regions. The diverted water ranges from about (1.99-2.33) Â 10 8 m 3 $yr -1 from Liangzhou District and Jingdian Reservoir. The maximum recoverable amount of groundwater is 1.16 Â 10 8 m 3 . According to The Shiyang River Basin Key Governance Projects [22] , the ratio of water available to agriculture and the total water available in Minqin County from 2015 to 2020 can be obtained as 0.56, 0.62, 0.61, 0.61, 0.61, and 0.60. Wheat, corn, vegetables, sunflower and cotton are the five major crops in Minqin County accounting for 80% of the total area planted, so 80% of the water available to agriculture was taken as the available water for the planting structure optimization for these five crops (Table 2). Other data for the five crops are listed in Table 3. These data were inputs of the ITSP model for CPSO. The submodels (Eqs. (7a)-(7e) and Eqs. (8a)-(8e)) of the ITSP model were solved under the limited available water of selected crops, then the optimized results for each selected crop compared with prevailing conditions (crop planting area in 2014) (Fig. 7). Figure 7 shows that the inflow at Caiqi Station from 2015 to 2020 obtained by UMLR model based on available water predictions affects the optimal planting area of the   five main crops. With the goal of guaranteeing food security, the optimal results tend to allocate less land to the two grain crops as these provide lower economic benefits.
The area planted to wheat should be increased and the area planted to corn reduced depending on the prevailing conditions. Likewise, the area planted to cotton should be increased appropriately based on the actual condition to meet the demand in the region, and the sunflower area should be substantially reduced compared to the actual condition because of its higher water consumption and lower economic returns. In low flow years, it would be appropriate to reduce the area planted to vegetables, as this can significantly improve the overall economic benefit with plenty of water. These results provide decision makers the possible available water amounts and a wide variety of support for decisions about planting area with different amounts of available water to ensure the supply of basic agricultural products in the region. Figure 8 shows the comparison of the optimized planting area with the prevailing conditions for the total of the five selected crops at different flow levels. It is clear that the total optimized planting area is less than the existing area, that is, the total optimized planting area is difficult to reach or exceed the existing area, which effectively meets the requirements of The Shiyang River Basin Key Governance Projects [22] . The mathematical expectation of the economic benefits, i.e., the objective of the ITSP model, is shown in Fig. 9. Compared to the average total benefits of the five crops planted in 2014, the optimized planting income clearly increases, which successfully avoids the risk of low benefits.
For policy makers in Minqin County, broad decision support is provided for various crop planting scenarios. The optimal planting area of each crop in three possible available water scenarios is provided in Fig. 7. Optimistic policy makers prefer the planting area in high flow level with higher risk of water shortage, and moderate policy makers prefer medium flow level with medium risk, while pessimistic policy makers prefer low flow level with lower risk. The optimal crop area at each flow level is also given as the interval form, which brings greater decision-making options for policy makers. To guarantee enough grain supply in the future, especially corn, policy makers should attempt to provide subsidies or raise the purchase price of grain crops because of their low economic benefits. If there is enough water for irrigating crops, vegetables should be a priority because of their shorter growth period and higher benefits. However, the dynamics of market prices cannot be ignored when formulating the crop planting policy. With the above measures, water shortage problem can be Over all, these analyses indicate that the system, in which the UMLR model and ITSP model are integrated, not only avoids the poor prediction accuracy of the runoff because of the uncertainties in the independent variable prediction, but also simultaneously handles multiple uncertainties of CPSO. This developed system can also be applied to other practical problems related to water resources management.

Conclusions
An UMLR model for runoff forecasting was developed. UMLR has the advantage of (1) avoiding the transfer of random error from independent to dependent variables, which affects the prediction accuracy, (2) providing probability for each possible water amounts. Based on the results of the UMLR, the ITSP model is used for CPSO, and it can (1) coordinate the risk and benefits in a framework, (2) address uncertainties expressed as intervals and randomness simultaneously, and (3) provide optimalplanting structures under different amounts of available water.
A case study of CPSO was provided to demonstrate the applicability of the proposed system combing the UMLR model for runoff forecasting with the ITSP model for CPSO. The case study was to optimally allocate crop planting area for selected crops with limited water resources (base on the downstream runoff forecasting) in order to obtain the maximum system net revenue. The comparisons show that the system developed is effective and suitable for arid regions.
This study attempted to improve the MLR model by introducing interval and stochastic uncertainties, and thus an UMLR model was developed. The predicted results of the UMLR model were integrated as the inputs to an ITSP model for CPSO. This system, which combines runoff forecasting and CPSO, can also be applied to other similar regions where the surface water is one of the major sources of water availability, helping decision makers allocate water resources more effectively. However, the UMLR model can hardly predict long-term stream flow because of the shortcomings of the MLR model. Fuzzy methods dealing with uncertainties and other complex methods have not been fully considered in CPSO but it would be desirable to include them in further research.