1. College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
2. School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
3. Department of Artificial Intelligence, School of Electronics Engineering, Kyungpook National University, Taegu 702-701, Republic of Korea
4. Department of Electrical and Computer Engineering, University of Alberta, Edmonton AB T6G 2J7, Canada; Institute of Systems Engineering, Macao University of Science and Technology, Macao SAR, China; Research Center of Performance and Productivity Analysis, Istinye University, Istanbul 34010, Türkiye
duyonghao15@163.com
Show less
History+
Received
Accepted
Published Online
2025-05-20
2025-12-04
2026-03-26
PDF
(11336KB)
Abstract
The Agile Earth Observation Satellite Scheduling Problem (AEOSSP) is a complex NP-hard challenge that involves selecting, sequencing, and timing observation tasks to maximize imaging profits while adhering to various constraints. In our study, we developed a mixed-integer programming model for AEOSSP, incorporating key constraints related to visible time windows and time dependencies. To tackle this, we propose an Evolutionary Adaptive Large Neighborhood Search Algorithm (evALNS) enhanced by Large Language Models (LLMs). Our work pioneers the application of LLMs to ALNS by being the first to automatically develop and evolve its critical destroy heuristics. However, a naive application of LLMs is insufficient for such a complex domain. We therefore introduce a novel Dual-Population Co-Evolutionary Computing Framework (DPEC) to bridge the LLM’s knowledge gap by synergizing LLM-generated heuristics with expert-designed ones. This co-evolution, guided by a Functional Natural Language Embedding (FNLE) strategy and customized prompts, significantly enhances the adaptability and efficiency of ALNS. Extensive numerical experiments demonstrated the superiority of the evALNS evolved under our framework, achieving an average profit improvement of 8.48% compared to the original ALNS with expert-designed destroy operators.
Earth Observation Satellites (EOSs) are vital for monitoring the planet’s environment, providing crucial data for applications such as environmental monitoring, navigation, and resource management (Du et al., 2022; Yang, 2021). Recent advancements in satellite technology have significantly improved maneuverability, leading to the development of Agile Earth Observation Satellites (AEOSs) (Yao et al., 2023). These new-generation AEOSs have rapidly become essential tools in space missions management, creating unprecedented development opportunities for next-generation remote sensing imaging technology due to their enhanced maneuverability and operational flexibility (Chen et al., 2024b; Chen et al., 2024c).
However, AEOSs introduce multidimensional complexity to scheduling due to their technical characteristics. Unlike traditional EOSs, AEOSs have three additional degrees of freedom—pitch (maneuvering forward or backward along the flight path), roll (tilting side-to-side), and yaw (rotating the satellite body)—allowing observations before and after overhead passage, thus extending visible time windows (VTWs). Fig. 1 illustrates this distinction, where tasks are represented by pink circles, their visible time windows (VTWs) by black boxes, and the actual observation periods by green strips. A VTW represents the entire timeframe of geometric visibility, which is often much longer than the required observation duration; the scheduling problem thus involves selecting an optimal start time for the observation (green strip) within the VTW (black box). As shown in the figure, the VTWs for different tasks (e.g., tasks 1 and 2, and tasks 3 and 4) are determined independently and can overlap in time, creating scheduling conflicts. In EOS mode, such conflicts prevent the execution of both tasks because the satellite cannot be pointed at two different targets simultaneously. In contrast, AEOSs use their maneuverability to mitigate these conflicts. For example, an AEOS can observe task 1 ahead of time with a pitch maneuver, then observe task 2 after passing overhead, effectively resolving the VTW conflict.
This enhanced maneuverability introduces multidimensional, nonlinear system characteristics. Inter-task maneuvering creates time-dependent constraints (Du et al., 2020), while variable energy consumption and storage further complicate scheduling. These factors make AEOS scheduling problems NP-hard (Wang et al., 2021). Therefore, developing efficient AEOS scheduling approaches to optimize observational profits has significant practical importance and underscores the need for intelligent scheduling methods.
Research on AEOSSP algorithms shows a trend toward diversification and interdisciplinary integration. For reinforcement learning algorithms, (Herrmann and Schaub, 2023) formulated the AEOSSP as a markov decision process. They employed monte carlo tree search with supervised learning to train the agent, evaluating two backup strategies. Exact algorithms, such as adaptive-directional dynamic programming with decremental state space relaxation, effectively solve single-orbit scheduling problems and enhance computational efficiency for AEOSSP (Peng et al., 2020). Heuristic algorithms (HAs) are preferred in complex optimization problems for their simplicity, computational speed, and resource efficiency (Li et al., 2023, Li et al., 2025). Unlike reinforcement learning, which requires extensive training data, and exact algorithms, which face exponential time complexity, HAs provide practical solutions for complex scheduling problems including AEOSSP (Kandepi et al., 2024; Song et al., 2024; Han et al., 2023).
Among HAs, Adaptive Large Neighborhood Search (ALNS) has shown strong optimization performance in AEOS scheduling (Liu et al., 2017; He et al., 2018; Liu et al., 2024c; Wu et al., 2024a) and in applications like the vehicle routing problem (Liu et al., 2023b; Jia et al., 2023). ALNS incorporates various destroy and repair operators with an adaptive selection mechanism that dynamically modifies solutions, enhancing local search effectiveness. A critical component of ALNS is the heuristic neighborhood structure design, particularly the destroy heuristic. This operator removes specific tasks from current solutions according to predefined rules. By disrupting local optimality through task removal, it increases the likelihood of finding global optima and prevents premature convergence by encouraging exploration of new solutions. Without efficiently designed destroy heuristics, ALNS may produce lower quality solutions and become trapped in local optima. Therefore, improving destroy heuristic efficiency is crucial for enhancing ALNS’s effectiveness in solving combinatorial optimization problems, particularly the complex AEOSSP.
To overcome HAs limitations while preserving efficiency, more advanced techniques have emerged. Methods integrating artificial intelligence (AI) with heuristic algorithms have been developed to address AEOS scheduling. Some approaches combine data-driven strategies with HAs to solve the AEOSSP (Du et al., 2019; Wu et al., 2022). For satellite range scheduling, methods utilizing deep reinforcement learning (DRL) integrated with heuristic scheduling techniques and a reinforcement learning-based genetic algorithm have been proposed (Ou et al., 2023; Song et al., 2023). A knowledge-assisted ALNS that combines integer programming and data mining effectively solves the satellite-ground link scheduling problem (Liu et al., 2024d; Chen et al., 2025a). The success of these hybrid methodologies has extended to other related scheduling domains (Pan et al., 2024; Wu et al., 2024b; Wang et al., 2024; Pan et al., 2025). This demonstrates that AI-HA integration has become a promising approach for solving the AEOSSP.
However, the AI technologies mentioned above are infeasible for designing efficient heuristic rules, such as destroy operators in ALNS, as they lack text and code generation capabilities. Large language models (LLMs), as pre-trained, advanced deep learning models, have demonstrated powerful capabilities in natural language processing, code generation, artificial intelligence systems, and uncovering scientific patterns (Zhao et al., 2023; Liu et al., 2024a; Vaithilingam et al., 2022; Chen et al., 2025b; Kaddour et al., 2023; Kasneci et al., 2023; Thirunavukarasu et al., 2023). Concurrently, studies have highlighted LLMs’ increasing use in optimization. For example, some researchers (Pluhacek et al., 2023) utilized GPT-4 to integrate Particle Swarm Optimization (PSO) with Cuckoo Search (CS), while others (Zhang et al., 2023) leveraged LLMs to enhance hyperparameter optimization by treating model code as a hyperparameter.
Furthermore, the integration of LLMs with evolutionary computation (EC) frameworks, known as LLM-EC, has emerged as a powerful paradigm for designing heuristics (Liu et al., 2024b; Liu et al., 2023a; Liu et al., 2024c; Stein and Bäck, 2025). This approach enables automatic generation of HAs without extensive human expertise or domain-specific knowledge. LLM-EC combines LLMs’ generative abilities with evolutionary computation processes—such as selection, crossover, and mutation—guided by tailored prompts. Over successive iterations, the algorithm’s fitness typically converges to yield an optimized heuristic. This fusion demonstrates LLMs’ potential to empower intelligent and creative optimization processes, opening new avenues for enhancing heuristic performance.
However, existing LLM-EC approaches often perform well on general or toy problems but face significant hurdles when applied to complex, real-world scheduling problems like AEOSSP. This is because LLMs lack the intricate, domain-specific knowledge required to understand constraints such as time-dependent transitions and multi-orbit VTWs. A naive application of LLM-EC is therefore likely to produce syntactically correct but semantically flawed or inefficient heuristics
To address the inefficiency of destroy operators in ALNS and efficiently solve the AEOSSP, we propose an Evolutionary Adaptive Large Neighborhood Search Algorithm (evALNS) that leverages LLMs’ powerful heuristic generation capabilities. The evALNS is an improved version of ALNS that incorporates creative and efficient destroy operators generated by LLMs. To the best of our knowledge, this is the first application of LLMs in designing ALNS destroy operators, significantly enhancing algorithm performance in the complex AEOS scheduling domain. While the concurrent study by (Ye et al., 2025) is highly relevant, our work differs fundamentally in its core challenge and methodology. Ye et al. focus on creating a self-adapting process for general-purpose optimization, where the LLM dynamically refines prompts for neighborhood selection using a scoring mechanism. In contrast, our research addresses the domain-specific knowledge gap of LLMs for the specialized AEOSSP (Adaptive Evolutionary Optimization for Scheduling Problems, if not defined earlier). We aim to enable LLMs to generate complete destroy heuristics, which extend beyond simple scoring rules. Our key contribution, the Dual-Population Co-Evolutionary (DPEC) framework, is designed for knowledge fusion, allowing the LLM to directly learn from and co-evolve with expert-designed heuristics.
First, to address LLMs’ insufficiency in understanding the AEOSSP, we proposed integrating LLM-generated heuristics with expert-designed heuristics within a Dual-Population Co-Evolutionary Framework (DPEC). This innovative framework allows LLMs to better understand the AEOSSP by utilizing information from the expert population, enabling the creation of highly effective destroy operators. Second, we introduced the concept of Functional Natural Language Embedding (FNLE) within the evolved destroy operator code, co-evolving these functional descriptions alongside their codes. By integrating functionality directly into the corresponding code and evolving it in a single-threaded manner, we maintained LLMs’ reasoning chain during the iteration process, facilitating efficient destroy operator design and deepening the model’s understanding of AEOSSP, thereby reducing error rates in generated code. Finally, we have developed tailored prompting strategies for AEOSSP. These strategies encompass efficient initialization prompts for destroy operators, prompts for crossover mutation, and prompts based on the FNLE we proposed.
Our evALNS leverages LLMs for dynamic heuristic generation to enhance ALNS performance in the AEOSSP, reducing manual design efforts and effectively addressing complex scheduling challenges like the AEOSSP.
In summary, this paper presents the following key contributions:
1) Pioneered the application of LLMs for designing critical components of ALNS, specifically by developing a framework to automatically evolve high-performance destroy heuristics for the complex AEOSSP.
2) Proposed a novel Dual-Population Co-Evolutionary (DPEC) framework that effectively bridges the knowledge gap between general-purpose LLMs and specialized optimization domains. This framework, supported by our FNLE strategy and customized prompts, enables LLMs to generate creative and domain-aware heuristics.
3) Provided empirical evidence of the framework’s effectiveness through extensive experiments. Our results not only show significant performance gains (an 8.48% improvement) but also, through ablation studies, demonstrate that the DPEC and FNLE components are crucial for success.
The paper is structured as follows: Section 2 presents the mathematical model for the AEOSSP, detailing the key notations and problem formulation. Section 3 discusses the evALNS algorithm assisted by LLMs, including the DPEC framework, FNLE strategy, and our proposed customized prompt engineering. Section 4 introduces the computational experiments and results, covering the ablation study, sensitivity analysis, and cross-validation. Finally, Section 5 concludes the paper by summarizing the findings and contributions.
2 Mathematical model
For AEOSSP modeling, several core constraints must be considered. First, the Visible Time Window (VTW) represents the limited time frames during which a satellite can observe a task. Typically, the scheduling period includes 14 to 15 orbital cycles per day (Liu et al., 2017), with each task potentially appearing in multiple cycles. This results in multiple VTWs for each task, requiring the implementation of the uniqueness constraint to select the appropriate VTW. Since the VTWs for AEOSs are generally much longer than the observation duration, a start time must be chosen within the task’s VTW to satisfy the specified time window limits, which are referred to as the VTW constraint. Additionally, AEOSs have time-dependent characteristics that necessitate sufficient time for the satellite’s attitude adjustments between tasks, leading to the transition time constraint. Furthermore, additional constraints such as energy and storage must also be considered.
Before modeling the AEOSSP, this paper introduces several key assumptions to simplify its complexity. Based on these assumptions, we define the notations used to describe the problem in this section. Clarifying these notations establishes a clear foundation for the subsequent model construction and analysis. Finally, we present the AEOSSP model, including the objective function and associated constraints.
2.1 Assumptions
To align with the operational workflows of actual satellite mission management and facilitate research, this study makes the following assumptions to simplify the AEOSSP:
1) For complex imaging requirements, such as large-area target coverage, methods like strip segmentation, as described by Chatterjee and Tharmarasa (2024), can transform these tasks into multiple meta-tasks. In this paper, we focus solely on meta-tasks.
2) Tasks are non-preemptive during execution; once initiated, a task cannot be interrupted.
3) Each task is executed no more than once and cannot be repeated.
4) Special constraints, such as emergency events and cloud cover, are not considered
2.2 Notations
We provide an overview of the parameters and variables utilized in this study. Let represent the set of tasks and represent the set of orbits in a defined observation period, where and denote the number of tasks and orbit, respectively. For clarity, the variables and related parameters used in modeling the AEOSS problem are summarized in Table 1. Notably, we have highlighted the decision variables in bold.
2.3 Model
Based on the analysis of the core constraints associated with the AEOSSP, we present the model as follows:
The objective function (1) maximize the total profits (An expert-defined nonnegative number describing the importance of the task) of all selected tasks. The detailed explanations of the constraints are outlined as follows:
1) Constraint (2) ensures that each task can be selected in at most one orbit .
2) Constraint (3) ensures flow balance for task in orbit , meaning the inflow and outflow of task are equal to .
3) Constraint (4) ensures that the total flow from the starting dummy node is 1.
4) Constraint (5) ensures that the total flow reaching the ending dummy node is 1.
5) Constraint (6) ensures that the total capacity consumed by all observed tasks cannot exceed the capacity limit .
6) Constraint (7) ensures that the total energy consumed by executing the scheduling plan does not exceed the energy limit .
7) Constraint (8) is the transit time constraint between tasks and in orbit .
8) Constraint (9) is the VTW constraint, meaning the beginning and ending observation times cannot exceed the duration of the VTW.
9) Constraint (10) ensures the feasibility of task in orbit .
10) Constraint (11) ensures that and are binary variables, and is a positive integer variable.
In the model, represents the transit time between tasks and , and can be calculated as follows:
The values in Eq. (12) are derived from reference (Wu et al., 2023). is a measure of the satellite’s attitude maneuvering angle, which can be calculated by
where , and represent the satellite’s roll, pitch, and yaw angles at the end time of task , and , , and are the corresponding angles at the start time of task , and since , then we obtain
Furthermore, since the observation duration of the tasks are a fixed value which are generally specified by the operations department, we can see that is actually a bi-variate function of the decision variables and , and can be defined as:
In Eq. (12), represent the angular velocities corresponding to different attitude angle variations, respectively. This is because larger changes in attitude angles require correspondingly higher angular velocities to maintain efficient attitude maneuvers. Specifically, we refer to references (Liu et al., 2017; Wu et al., 2023) and set the parameters as follows: , , , and . It is also important to note that the VTW on different orbits must adhere to the transit time constraint. This is because a satellite will pass through an extended eclipse period when it changes orbits. Therefore, we have imposed transit time constraints only on different VTWs within the same orbit.
Considering the constraints presented, Constraint (8) is inherently nonlinear. This complexity renders the AEOSSP NP-hard (Liu et al., 2017), posing a significant challenge even for professional integer programming solvers. Research by (Liu et al., 2017) demonstrated that CPLEX becomes ineffective when handling more than 12 tasks. This inefficiency stems from the expanded solution space created when linearizing the nonlinear constraint. HAs, particularly ALNS, provide advantages over exact algorithms and reinforcement learning-based methods, including greater efficiency, flexibility, and robustness. This motivates our approach to leverage LLM-assisted heuristic algorithm design to enhance ALNS for efficiently solving the AEOSSP.
3 Evolutionary ALNS driven by large language models
In this section, we introduce our primary contribution: an evolutionary adaptive large neighborhood search algorithm driven by LLMs to effectively solve the complex AEOSSP. Recognizing that a naive application of LLMs lacks the domain-specific knowledge required for this problem, we have developed a novel Dual-Population Co-Evolutionary (DPEC) framework to guide the evolution of high-performance destroy operators. This section systematically unpacks our methodology. We begin by presenting the overall architecture of the LLM-DPEC framework (Section 3.1). We then detail its critical components, including our unique approach to individual representation using a functional natural language embedding strategy (Section 3.2), the initialization of both the LLM-generated and expert populations (Section 3.3), the design of tailored, prompt-driven evolutionary operations (Section 3.4), and finally, the fitness evaluation process (Section 3.5).
In contrast to classic problems like the TSP and the bin packing problem, the training data for the AEOSSP is less abundant for LLMs. While solutions such as fine-tuning the model (Ding et al., 2023; Thirunavukarasu et al., 2023) exist to improve its suitability for the AEOSSP, this approach is labor-intensive, inefficient, and can result in significant waste of computational resources. Recently, retrieval augmented generation (Gao et al., 2023; Fan et al., 2024) has emerged as a promising alternative; However, it also encounters several challenges, including the need for considerable human and material resources to collect and organize data for constructing a vector database. A lightweight solution involves incorporating information from classic and efficient expert-designed destroy operators into the LLMs. This integration would enable the LLMs to develop a comprehensive understanding of the heuristic rules and design concepts relevant to the AEOSSP.
Fig. 2 illustrates the complete workflow of our proposed LLM-DPEC framework, which is designed to evolve high-performance destroy operators for our evALNS algorithm.
The process can be broken down into the following key stages:
1) Initialization: The process begins by creating two distinct populations of destroy operators. The first, the LLM Population, is generated from scratch using carefully designed prompts. The second, the Expert Population, is seeded with well-established, human-designed heuristics from existing literature.
2) Parallel evolution: Each population undergoes an independent evolutionary process for a set number of generations. Within each generation, new candidate operators (offspring) are created through LLM-guided crossover and mutation operations.
3) Fitness evaluation: Every newly generated destroy operator is evaluated by integrating it into the ALNS algorithm. This ALNS is then run on a specific AEOSSP scenario, and the resulting total profit serves as the operator’s fitness score. A higher profit indicates a more effective operator.
4) Co-evolutionary interaction: After a predefined number of generations (the elite exchange cycle), the core co-evolutionary mechanism is triggered. The top-performing individuals (elites) from the LLM population are transferred to the expert population, and vice versa. This exchange allows the LLM to learn from proven, domain-specific strategies while potentially injecting novel, creative solutions into the expert pool.
5) Termination and selection: The cycle of parallel evolution and periodic interaction continues until the maximum number of generations is reached. Finally, the single best-performing destroy operator from either population is selected as the final output, which is then used in the evALNS for solving the AEOSSP.
The advantage of this framework is that it allows expert-designed heuristic information to assist the LLMs in better understanding the AEOSSP, while the heuristics generated by the LLMs may, in turn, help the expert population acquire new heuristic rules and both evolve more effectively. Algorithm 1 Illustrates the pseudo-code of the proposed LLM-DPEC framework.
Algorithm 1 details the execution flow of the proposed LLM-DPEC framework. The process begins with the initialization of the LLM and expert populations. Subsequently, for a predefined number of generations, the algorithm performs selection, crossover, mutation, and fitness evaluation within each population. After a specified number of generations, an elite exchange mechanism facilitates co-evolution by transferring top-performing individuals between the two populations. Upon completion, the framework returns the overall best destroy heuristic discovered. To explore the algorithm space more effectively, we designed multiple crossover and mutation operators. Six prompt strategies were developed to create new destroy heuristics based on specific probabilities. When a strategy is selected in each generation, it is applied times to produce destroy heuristics. Each new heuristic is evaluated on AEOSSP scenario, and if feasible, added to the current population. Consequently, each generation can incorporate up to new heuristic rules into the existing population. Finally, the best destroy heuristics are selected to form the next generation.
3.2 Individual representation
In this section, we will introduce the individual representation within the LLM-DPEC framework, focusing primarily on the coding of individuals and our novel FNLE strategy applied to individual coding.
3.2.1 Individual coding
In the LLM-DPEC framework, each destroy operator functions as an individual. The destroy operator, coding in Python, is illustrated in Fig. 3. The code block consists of two main parts. The first part includes an explanatory section, which is further divided into three components:
1) Functionality block: The functionality block outlines the implementation logic and purpose of each destroy operator. This explanation aids the LLMs in understanding the algorithms within the population during each iteration, ultimately allowing it to generate more effective algorithms.
2) Parameter block and return value block: The two blocks clarify the parameters of the algorithm and the output format, ensuring consistency throughout the implementation.
It is important to note that all destroy operators are encapsulated within the Destroy class in this implementation, indicating that the parameters for each destroy operator function are set to “self.”
3.2.2 Natural language embedding for individuals’ functionalities
The functionality of individuals often involves complex algorithmic logic and parameter settings. Using natural language to embed this information helps LLMs grasp the behavior and intent of individuals more intuitively. This understanding not only aids in generating new individuals but also provides feedback on existing ones during the optimization process. Furthermore, FNLE enhances algorithm design and implementation clarity by embedding algorithm functionalities and characteristics in natural language. This approach minimizes coding errors from misunderstandings or ambiguities and allows for effective identification and reuse of algorithm features, reducing redundant implementations and code duplication. Fig. 3 illustrates an example of an individual generated during the evolutionary process.
Fig. 4 demonstrates that each individual possesses a unique FNLE within the function code explanation block. As evolution advances, these FNLEs gradually converge into three main components: first, the indicators, which can be referred to as the characteristics of the problem scenarios utilized by the individual; second, an explanation of how the destroy operator selects tasks for destroy; and third, additional elements, methods, or mechanisms that are currently integrated into this operator.
The application of the FNLE strategy has led to the design of a specific mutation prompt, as illustrated in Fig. 8 in Subsection 3.4.2, which depicts the crossover prompt C1. The purpose of prompt C1 is to enable selected parent individuals to integrate their functionalities. This fusion clearly benefits the LLMs by allowing them to better understand the functions of each individual through the natural language representation of each parent’s FNLE. This improved understanding facilitates enhanced integration and crossover, ultimately resulting in more accurate code and improved destroy capabilities of leading offspring individual. Figure 5 provides an example of the process for generating offspring in a specific iteration using the C1 prompt in conjunction with the FNLE strategy.
Figure 5 shows that, guided by the C1 prompt, the offspring inherit indicators from the parents. For example, the yellow sections from Parent 1 and the green sections from Parent 2 are both passed on to the offspring. Additionally, several destroy mechanisms are integrated into the offspring. For instance, the adaptive weight adjustment mechanism from Parent 1 and the probabilistic removal mechanism from Parent 2 are combined into the offspring’s mechanism of adaptive randomness and multi-criteria decision-making, as shown in the pink section.
In summary, by effectively utilizing FNLE, we can enhance the functional understanding of individuals, facilitating the resolution and optimization of the AEOSSPs. This approach not only provides rich contextual information to LLMs but also offers new insights for future research in heuristics evolution by using LLMs.
3.3 Initialization
In this section, we will describe how the two populations in the LLM-DPEC framework are generated during initialization.
3.3.1 LLM-generated population
First, the LLM-generated population is created using an initialization prompt to guide the LLM. Most research focuses on well-known problems like TSP, VRP, and bin packing, which likely
benefit from ample training data available to the LLM. In contrast, research on AEOSS is less prevalent with scarce resources, creating challenges for the LLM to comprehend these issues deeply. A well-structured initialization prompt is essential for generating efficient initial destroy operators tailored to the AEOSSP. We organize the initialization prompts into several modules as shown in Fig. 6.
1) Problem description: This module clarifies the nature of the problem and our objectives, effectively informing the LLM of intended outcomes.
2) Example codes: Providing example code during initialization enhances the LLM’s robustness and correctness while helping it understand our goals. Fig. 3 displays this example code module.
3) Supporting classes: Project-based approaches are preferred since we typically encapsulate VTW and attitude angle data before instantiation. Clear information about data types enables the LLM to design more efficient destroy operators while enhancing code correctness.
Detailed population initialization prompts can be found in Appendix A Fig. A1 It includes the following modules:
4) Variable explanations: Clear explanations reduce the risk of misusing variables in complex problems like AEOSS, minimizing errors in LLM outputs.
5) Clear task prompts: These help the model accurately understand intentions and expectations, ensuring generated responses align with user needs and leading to more effective destroy operator designs.
6) Requirements and output control: Clearly specifying the requirements for function parameters, return values, helper functions, third-party libraries, and output formats is crucial. Doing so can significantly reduce errors in code generation and simplify the management of strings produced by LLMs.
3.3.2 Expert-generated population
Second, the expert-generated population is based on the efficient destroy operators developed in previous studies, such as those by (He et al., 2018; Wu et al., 2023). We select a subset of these operators to form the initial population within the expert-generated population in the DPEC framework. The selected expert individuals, along with their detailed introductions, are as follows:
1) Worst path destroy: This destroy operator evaluates the quality of a solution sequence of length by calculating its unit profit (i.e., the profit of tasks divided by the length of the corresponding time period) and removes the worst-performing sequence.
2) Priority destroy: This greedy heuristic aims to remove the lowest-priority tasks from the current scheduling solution.
3) Opportunity destroy: Within a scheduling period, a satellite may have multiple time windows for a single observation task. This destroy operator will remove tasks with the maximum number of VTWs.
4) Conflict destroy: The calculation of the conflict degree is a two-step process:
First, the conflict degree between the currently scheduled VTW and other unscheduled VTWs is defined as follows:
where represents the currently scheduled VTW, and is the set of unscheduled VTWs, represents the set of all VTWs that overlap with , and indicates the overlap length. Therefore, the conflict degree of is defined as the average overlapping length with .
Second, the conflict degree of task , denoted as , can be expressed as the sum of the conflict degrees of all its time windows. The formula is:
where is the conflict degree of task , is the set of all VTWs of task , is the conflict degree of a single time window .
Finally, the operator identifies the tasks with the highest total conflict degree and removes them from the current schedule. This targeted removal aims to resolve the most significant bottlenecks, increasing the flexibility for the repair operator to improve the overall solution.
5) Transit-time destroy: In the current scheduling plan, we sequentially calculate the transition time from each task to the next and subsequently remove the tasks with the longest transition times. This process is illustrated in Fig. 7. Let represent the set of transition times, defined as
The destroy operator will then remove the tasks corresponding to the maximum elements in .
3.4 Evolutionary operations
This section introduces the mechanism of individual selection in the evolutionary process and explains how crossover and mutation operations are implemented.
3.4.1 Selection operation
In the LLM-DPEC framework, similar to traditional EC, the individual selection mechanism determines how individuals are chosen to pass their genetic material to the next generation. This selection process typically relies on the fitness of each individual, which measures their performance or suitability to the AEOSSP scenario. Generally, individuals with higher fitness are more likely to be selected for reproduction, allowing their traits to spread through the population. Various strategies can implement this, such as the roulette wheel selection, where the probability of selection is proportional to fitness, or the tournament selection, where a random subset of individuals is chosen and the fittest among them is selected. We have chosen the roulette wheel method as the selection mechanism in the LLM-DPEC framework.
3.4.2 Crossover operation
Unlike traditional EC framework, in the LLM-DPEC framework, individuals are destroy HAs, which renders traditional crossover operations inapplicable. To facilitate the crossover of these algorithms, we have created specific prompts to guide LLMs in functionally integrating different individuals. For this purpose, we have designed two types of crossover prompts (C1, C2), each custom-designed to offer various strategies and directions for crossover destroy operators. The specific content is illustrated in Fig. 8. It is important to emphasize that this prompt-driven evolutionary mechanism is applied uniformly to both the LLM-generated and the expert-designed populations.
We will next explain the functions of these two crossover prompts separately.
C1: Combine functionalities: By combining existing destroy operators’ capabilities, we create versatile operators with multiple functionalities, enhancing adaptability. Subsection 3.2.2 provides details on this FNLE strategy, which helps LLMs explore the solution space more effectively.
C2: Simulated crossover operation: Drawing from evolutionary computation, combining features of existing destroy operators through crossover operations produces efficient results while preserving genetic diversity. This approach guides LLMs to explore various crossover operators, improving the search for optimal algorithms.
3.4.3 Mutation operation
Similar to the crossover prompts, four types of mutation prompts (M1, M2, M3, M4) have been specifically designed to offer various strategies and directions for mutation destroy operators. The details are shown in Fig. 9.
Next, the functions of these four mutation prompts will be explained separately.
M1: In-depth understanding of variables: Prompt the LLM to analyze variable types (especially AttitudeAngle and VTW) to select a parent operator that guides the creation of a distinct operator. This deeper understanding enables generation of more innovative destroy operators.
M2: Randomness: Guide the LLM to incorporate randomness into the algorithmic logic when necessary, as randomness is crucial for maintaining robustness across scenarios.
M3: Use of more indicators: Encourage the LLM to integrate additional data indicators into the selected parent to design more complex destroy operators. Examples include profit rate per VTW, conflict degree within VTW, time intervals, and statistical measures (mean/variance of attitude angles).
M4: Parameter tuning: Direct the LLM to fine-tune the selected parent’s parameters to optimize both efficiency and effectiveness.
3.5 Evaluation
Fig. 10 shows the simple demonstration for the evaluation of the individuals. As shown in Figure 10, within the evolution of the LLM-DPEC framework, new destroy operators created through crossover or mutation are integrated into the ALNS algorithm. The ALNS algorithm is then used to solve specific AEOSSP scenarios, returning a profit value, which is the sum of all task priorities in the scheduling plan. This profit value serves as the fitness value of the new destroy operator. A higher fitness value indicates better performance of the destroy operator.
4 Computational experiments
To verify the effectiveness of the proposed LLM-DPEC-assisted evALNS and its superiority over standard LLM-EC-assisted and expert-designed ALNS for AEOSSP, we conducted comprehensive experiments using randomly generated scenarios. This also includes ablation studies validating the effectiveness of dual populations and the FNLE strategy we proposed. All experiments were performed using Python 3.12 on a laptop with an i7-11700K processor (3.6 GHz) and 32 GB RAM.
4.1 Experiment setup
A significant challenge in AEOSSP research is the absence of universally recognized public benchmark data sets. This is largely due to the high problem specificity, where different satellite systems, orbital parameters, and mission objectives lead to variations in problem models and constraints across studies. Therefore, to ensure the reproducibility and relevance of our experiments, we employ a high-fidelity scenario generator described by (Chen et al., 2024a) The generation method is as follows:
The scheduling period spans from April 20, 2013, at 00:00:00 to April 21, 2013, at 00:00:00, covering one day and approximately 15 to 16 orbits. The satellite’s orbital parameters include: semi-major axis of 7,141,701.7 m, orbital eccentricity of 0.000627, orbital inclination of 98.5964 degrees, argument of perigee of 95.5069 degrees, right ascension of the ascending node of 342.307 degrees, and mean anomaly of 125.2658 degrees. These parameters fully describe the satellite’s orbital characteristics and position.
Tasks are uniformly distributed within a latitude range of 3° to 53° and a longitude range of 74° to 133°. The number of tasks ranges from 100 to 400, increasing in increments of 25, resulting in 13 different scenarios.
Each task in a scenario is defined by six attributes: task index, longitude, latitude, execution time, and priority. Execution time and priority are randomly selected integers ranging from 1 to 30 s and 1 to 10, respectively, where 1 is the lowest priority and 10 the highest. Latitude and longitude coordinates are generated uniformly within the given ranges. Based on the satellite’s orbital parameters and the tasks’ geographic coordinates, the required time-dependent attitude angles and the corresponding VTWs for each task are pre-calculated. These pre-calculated data serve as fixed inputs for our scheduling model. The core decision of our algorithm is then to select the optimal start time for each scheduled task within its VTW, which in turn determines the specific angles used for calculating transition times. For the experiments, we refer to scenarios by the number of tasks, e.g., Scenario 100 contains 100 tasks.
The key algorithm parameters are listed in Table 2:
We selected two types of LLMs to evaluate their impact on the performance enhancement of evALNS. These are DeepSeek-v3 from DeepSeek and Qwen-Long from Alibaba. It is important to note that during the evaluation process, only one destroy operator generated by the LLM-DPEC or standard LLM-EC frameworks is chosen at a time. However, when comparing algorithms, we select the top five destroy operators identified after completing all iterations.
4.2 Experimental results
4.2.1 Results analysis
Next, we compare the expert-designed ALNS, the evALNS assisted by the LLM-EC framework (Liu et al., 2023a), and the evALNS generated by our proposed LLM-DPEC framework. We also consider ablation experiments on the dual population and FNLE strategy. The expert-designed ALNS refers to the ALNS with expert individuals described in Section 3.3.2. All comparison data represent average yields from 50 runs of each algorithm, where yield refers to the ratio of the current solution’s profit to the total profit of the current scenario. We define “Gap” as the relative profit improvement of evALNS compared to expert-designed ALNS:
where and represent the profits obtained by evALNS and ALNS algorithms, respectively.
Tables 3 and 4 display the average profits obtained from 30 iterations of expert-designed ALNS and evALNS assisted by the LLM-EC and LLM-DPEC frameworks. Specifically, Table 3 presents results generated by DeepSeek-v3, while Table 4 exhibits results from Qwen-Long. We emphasize the profit gap (Eq. (18)) of our proposed evALNS compared to expert-designed ALNS. Additionally, Fig. 11 illustrates the performance difference between the LLM-EC and LLM-DPEC frameworks when two distinct LLMs are used.
For clarity, LLM-EC and LLM-DPEC refer to the evolutionary ALNS (evALNS) algorithm implemented within the respective frameworks. “Basic” refers to the expert-designed ALNS. Thus, “Profit (LLM-EC)” indicates the total profit obtained by evALNS within the LLM-EC framework, while “Gap (EC)” signifies the percentage enhancement achieved by evALNS compared to expert-designed ALNS, as defined in Eq. (18).
Based on the provided data and Fig. 11, a comprehensive analysis shows that DPEC consistently outperforms EC in profit improvements over ALNS in both the DeepSeek-v3 and Qwen-Long models. In Table 3, LLM-DPEC exhibits an average gap of approximately 8.48%, compared to LLM-EC’s 6.75%. Additionally, the gap enhancement under the LLM-DPEC framework is 20.4% relative to the LLM-EC framework. Similarly, in Table 4, DPEC maintains a higher average gap of 7.31%, compared to LLM-EC’s 3.95%. The gap enhancement under the LLM-DPEC framework is 45.96% relative to the LLM-EC framework. This evidence indicates the superior performance of the LLM-DPEC framework in enhancing the profit across all scenarios.
The findings support the effectiveness and necessity of creating the heuristic function in ALNS using LLMs to solve the AEOSSP. Furthermore, they demonstrate that the integration of dual population and FNLE strategies into DPEC significantly enhances the effectiveness of evALNS.
Additionally, Table 3 generally shows larger gap values when compared to the data in Table 4, suggesting that the performance of the destroy operator generated by GPT is superior to those generated by Qwen-Long. While LLM-EC offers more stable and consistent gap improvements, LLM-DPEC exhibits greater variability, particularly in scenarios such as 300 and 325, but consistently delivers higher profit enhancements. This indicates that the DPEC framework has a significant advantage in breaking the ALNS algorithm’s tendency to get trapped in local optima when solving the AEOSSP. This characteristic between stability and performance highlights LLM-DPEC’s potential as a powerful optimization framework for solving the AEOSSP using evALNS. Overall, LLM-DPEC’s advanced strategies make it a superior choice for evolving the destroy operator in ALNS, efficiently addressing the AEOSSP and demonstrating significant advantages regardless of the selected LLM.
Figure 12 illustrates the convergence curves of fitness by using the selected LLMs within the LLM-DPEC framework for scenario 400.
The convergence curves for fitness (scenario profit) in scenario 400 across different LLMs reveal a rapid increase in fitness values during the initial generations, followed by gradual stabilization. The maximum, minimum, and average fitness values demonstrate significant growth early on, indicating a swift improvement in population fitness. This growth eventually levels off, showcasing the algorithm’s strong convergence and stability. Initially, the Destroy Operator promotes diversity within the population, displaying a wide range of fitness values. As generations progress, the fitness values cluster more tightly, indicating convergence toward higher fitness solutions. This trend highlights the algorithm’s shift from exploration in the early stages to exploitation and optimization in later stages, effectively enhancing overall population fitness. Overall, the curves demonstrate the algorithm’s effectiveness in improving fitness in the given scenario.
4.2.2 Analysis of computational cost
A key consideration for our LLM-DPEC framework is the computational cost of evolving destroy operators offline. To quantify this, we estimate that evolving the operators for a single 200-task scenario takes approximately 1.34 h (4810 s), as shown in Table 5. This calculation includes the total time for LLM API calls (for initialization, crossover, and mutation) and the many ALNS fitness evaluations (100 iterations per evaluation) throughout the entire 20-generation process.
This one-time, offline investment is justified by the significant and recurring gains it delivers in online performance. The investment produces a qualitatively superior algorithm; unlike the expert-designed ALNS, which faces diminishing returns with extended runtimes, evALNS uses evolved operators to unlock more profitable solution spaces, a foundational improvement reflected in the 8.48% performance uplift. Crucially, our cross-validation study (Section 4.4) confirms that these operators are not overfitted and generalize effectively to unseen problems. Therefore, for practical applications such as daily satellite scheduling, the initial time investment is rapidly amortized. Each subsequent operational run benefits from the superior heuristics, translating the upfront training cost into consistent and substantial long-term profit gains.
4.3 Ablation study
To assess the impact of the dual population and FNLE strategies on the LLM-EC framework, we will conduct several ablation experiments in this section. First, we will remove the FNLE strategy from the LLM-EC framework, specifically eliminating the natural language embedding strategy during the evolution of the destroy operators. We will then compare the gap in this scenario using data for the expert-designed ALNS consistent with Tables 3 and 4. It is important to note that when we remove the FNLE strategy for ablation study, the crossover prompt C1 we designed also becomes ineffective. To ensure fairness, we employ a crossover prompt strategy as outlined in Reference (Liu, et al., 2023a), as its effectiveness has already been validated in heuristic design. The details of this prompt are shown as follows:
New C1 prompt: Please identify the common elements of the ideas presented in the provided functions, and then, based on these common elements, create a different function that incorporates these ideas by adding some new components.
Table 6 presents the results of the ablation study by using DeepSeek-v3. The symbol “LLM-EC-1” refers to the evALNS algorithm evolved by the LLM-EC framework that incorporates the FNLE strategy, while the symbol “LLM-EC-0” refers to the evALNS algorithm evolved by the LLM-EC framework without the FNLE strategy. Similar to the previous data tables, the first two columns of the table represent the profit values for the corresponding scenarios, while the last two columns indicate the relative performance improvement gap percentages.
Figure 13 shows the optimization gap percentages between LLM-EC with and without the FNLE strategy in various scenarios. The size and color of the bubbles represent the magnitude and direction of the performance differences.
Data from Table 4 and Fig. 13 demonstrates that the FNLE strategy generally enhances evALNS algorithm performance across most scenarios, evidenced by higher profit values and gap percentages compared to the variant without FNLE. Although performance gains in scenarios 300 and 350 are less significant, Tables 3, 4, and 5 show consistently high performance improvements for scenarios 300 and 325. This indicates these scenarios are particularly responsive to destroy operator evolution in the LLM-EC framework, suggesting the new C1 prompt performs effectively in these cases.
4.4 Sensitivity analysis
In this section, we will conduct a sensitivity analysis of the parameters within the LLM-DPEC framework. Our analysis will focus on two key aspects: first, we will keep all other parameter settings constant while setting the evolutionary generation to 30. Second, we will maintain the other parameters and adjust the mutation rate to 0.2. Subsequently, we will re-evolve the destroy operators and apply them to ALNS to develop the new evALNS. This modified approach will be used to solve
the AEOSSP and calculate the corresponding profit for each scenario. Finally, we will compare the resulting gaps. The table below presents the gap percentage data for evALNS across three frameworks: the basic LLM-DPEC, the LLM-DPEC with the number of iterations set to 30 (referred to as iter30), and the LLM-DPEC with the mutation rate adjusted to 0.2 (referred to as mut0.2). Table 7 details the data of sensitivity analysis.
According to the data in Table 7, the average gap is 10.26% for the basic configuration, 10.30% for iter30, and 10.26% for mut0.2. These results show that while increasing iterations (iter30) slightly improves evALNS performance, the difference is very small. Similarly, adjusting the mutation rate to 0.2 (mut0.2) has almost no effect. This suggests that parameter tuning only marginally enhances the LLM-DPEC framework’s performance in solving AEOSSPs. Given the small differences, it is important to carefully select parameter settings based on specific scenarios to balance performance and computational costs, such as token usage.
4.5 Cross validation
In the LLM-DPEC framework, each scenario ultimately evolves a population of 10 destroy operators, which are interchangeable across different scenarios. While the previous sections demonstrated the effectiveness of our framework for specific instances, this section aims to verify the generalization performance of these operators across various scenarios. To this end, we conduct a cross-validation experiment by employing the K-fold cross validation method (Wong and Yeh, 2020). The specific cross-validation process is outlined in Algorithm 2 below.
In Algorithm 2, we first subdivide the set of scenarios into 5-folds: [100, 125, 150], [175, 200, 225], [250, 275, 300], [325, 350], and [375, 400]. We then iterate through these 5-folds to access the full scenario validation. In each iteration, we designate the current fold as the target fold and randomly select five scenarios from the remaining scenarios to form the resource fold. Next, we extract the evolved destroy operators from each scenario within the resource fold and integrate these operators into evALNS to address the scenarios in the target fold. For instance, if the target fold is [100, 125, 150], we can randomly select five scenarios from [175, 200, 225, 250, 275, 300, 325, 350, 375, 400], such as [200, 300, 325, 225, 400]. We then extract the most effective evolved destroy operators from these scenarios for use in evALNS and solve the three scenarios in the target fold [100, 125, 150] 50 times to obtain the corresponding average profits. Finally, we conclude the cross-validation process for all scenarios, thereby outlining the complete methodology of 5-fold cross-validation.
Table 8 shows the gap percentages data of the cross validation results for all scenarios under the LLM-DPEC framework by using DeepSeek-v3.
Figure 14 shows that the gap percentages for the evALNS under the original LLM-DPEC, the LLM-EC Framework and the LLM-DPEC for cross validation.
The results presented in Tables 3 and 8, as well as in Fig. 14, show that the gap percentages obtained through cross-validation are generally comparable to those from the original LLM-DPEC framework, and often surpass those of the LLM-EC framework. In particular, for instances such as 225, 300, and 375, the cross-validation gaps achieved by the LLM-DPEC framework are slightly higher than those of the original framework. Furthermore, the average gap of evALNS under cross-validation is 8.29%, close to the original LLM-DPEC average of 8.48%, whereas the evALNS gap under the LLM-EC framework is noticeably lower at 6.75%.
These findings lead to two main insights. First, the evolved destroy operators within the LLM-DPEC framework exhibit strong generalization capabilities; using them across different scenarios yields consistently competitive results, although slightly below those of the original setting. Second, both the cross-validated LLM-DPEC framework and the original LLM-DPEC consistently outperform the LLM-EC framework. Taken together, these results confirm the effectiveness and versatility of the proposed LLM-DPEC framework.
4.6 Algorithm comparison
To address the need for a more comprehensive performance evaluation, we conducted an extensive comparative study to benchmark our LLM-assisted approach against several state-of-the-art (SOTA) and baseline algorithms. This section presents the setup, results, and analysis of this comparison, providing convincing evidence of our algorithm’s superiority.
4.6.1 Competitor algorithms and parameters setting
We selected three widely recognized algorithms to serve as competitors, representing different levels of heuristic sophistication. Furthermore, to demonstrate the power of our LLM-evolved operators, we tested them within both the standard ALNS framework and a more advanced hybrid ALNS framework. The algorithms are defined as follows:
Genetic Algorithm (GA) (Xhafa et al., 2012): A classic population-based metaheuristic. We implemented a standard GA with tournament selection, single-point crossover, and random mutation operators, which serves as a baseline to demonstrate the problem’s complexity for traditional evolutionary methods.
Hill climbing with Expert Operators (HC) (Yao et al., 2023): A simple but effective local search algorithm. This version uses the same set of five expert-designed destroy operators as our baseline ALNS.
ALNS with Tabu search (ALNS/TFP) (He et al., 2020): This represents a strong SOTA competitor, inspired by the successful hybrid approach in (He et al., 2020). It enhances the expert-designed ALNS with a Tabu Search mechanism to avoid cycling, making it a highly robust and powerful heuristic for the AEOSSP.
evALNS (Ours): This is our proposed method as described in the paper, which uses the standard ALNS framework but replaces the expert-designed destroy operators with the superior ones evolved by our LLM-DPEC framework.
evALNS-TFP (Ours): This is a hybrid algorithm designed to test the ultimate potential of our evolved operators. We replaced the human-designed operators in the strong ALNS/TFP framework with our LLM-evolved operators. A superior performance here would demonstrate that our framework cannot only create a good standalone algorithm but can also significantly enhance existing SOTA methods.
The parameters for the comparative experiments were set to ensure a fair evaluation. All local search-based algorithms were executed for a maximum of 1,000 iterations, with their other parameters configured as specified in their original references. The Genetic Algorithm (GA) was configured with a population size of 100 for 500 generations, utilizing a crossover and mutation rate of 1.0 for both.
4.6.2 Comparative results and analysis
All algorithms were run 30 times on the same 13 scenarios described in Section 4.1, and the average profit values are reported. The experimental results are summarized in Table 9 and visualized in Fig. 15.
The results presented in Table 9 and Fig. 15 lead to several key insights:
1) The Genetic Algorithm (GA) consistently yields the lowest profit, confirming that the complex, constrained nature of the AEOSSP is challenging for standard population-based methods without specialized operators. The Hill Climbing (HC) algorithm performs reasonably well, but is clearly outperformed by the more sophisticated ALNS-based methods, highlighting the benefit of an adaptive search framework.
2) Our proposed evALNS demonstrates competitive performance against the strong SOTA baseline, ALNS/TFP. In several smaller to mid-sized instances (e.g., 125, 150, 175, 325 tasks), evALNS achieves higher profits, validating that the heuristics discovered by our LLM-DPEC framework are highly effective on their own.
3) The most significant finding is the outstanding performance of the evALNS-TFP algorithm. It consistently outperforms all other competitors, including the strong ALNS/TFP baseline, across nearly all scenarios. This result is crucial because it proves that our LLM-evolved destroy operators are not just effective, but are demonstrably superior to the set of carefully crafted, expert-designed operators used in the SOTA framework.
4) The performance gap between evALNS-TFP and ALNS/TFP widens as the problem size increases (from task 225 onwards). For the largest 400-task scenario, evALNS-TFP achieves an average profit of 1289.8, which is a 12.0% improvement over the 1151.7 achieved by ALNS/TFP. This indicates that the heuristics generated by our framework are more adept at navigating the larger and more complex search spaces of difficult scenarios. This synergy—combining a SOTA framework with our superior, LLM-generated operators—unlocks a new level of performance that neither component could achieve alone.
In summary, this comprehensive comparison provides clear and convincing evidence of the value of our proposed approach. The LLM-DPEC framework is capable of automatically discovering destroy heuristics that are powerful enough to not only compete with but also significantly enhance state-of-the-art algorithms for the complex AEOSSP.
4.7 Demonstration of the evolved destroy operator
To crystallize the novelty of the evolved heuristics, we present a detailed breakdown of the seven distinct criteria employed by the optimal destroy operator evolved for Scenario 375. Unlike traditional expert-designed operators, which typically focus on a single metric (e.g., removing tasks with the lowest priority or longest transition time), our LLM-evolved operator performs a sophisticated, multi-criteria evaluation. As detailed in Table 10, these criteria span multiple dimensions of the problem space, from task value and temporal constraints to complex satellite attitude dynamics.
The true innovation lies not only in the identification of novel criteria like Attitude Stability and VTW Feasibility but in the operator’s ability to synthesize these seven distinct signals. It uses a dynamic weighting scheme (as detailed in Appendix A Fig. A2) to aggregate these scores, creating an adaptive policy that is far more powerful than any single, static rule. This demonstrates that our LLM-DPEC framework does not merely recombine existing ideas but discovers fundamentally new and more effective heuristic structures for complex optimization problems.
Additionally, we extracted the functional descriptions of all destroy operators generated by the LLM-DPEC framework using DeepSeek-v3 for 14 scenarios and created a word cloud, as shown in Fig. 16. The figure below highlights several key points. High-frequency vocabulary, including “attitude,” “duration,” and “priority,” indicates that the final evolved destroy operators effectively utilize data related to attitude angles, task execution duration, and task priority, thereby enhancing the effectiveness of scheduling solutions. The terms “dynamic,” “random,” “scoring,” and “weighted” suggest that the optimal destroy operator involves complex processes and mechanisms for dynamic adjustment and optimization. These operators incorporate elements of dynamism and randomness to adapt to changing scheduling requirements; while scoring and weighting are used to evaluate and optimize scheduling solutions based on specific criteria. Terms such as “balance,” “impact,” and “transition” indicate that the most effective destroy operators balance multiple factors. This reflects the role of the proposed prompt M3, which aims to introduce additional indicators. The term “transition” underscores that the operators ultimately incorporate the concept of transit time, which is derived from the information introduced in the expert population.
5 Conclusions
Our research demonstrates that incorporating LLMs into ALNS has led to the development of innovative destroy operators specifically for the AEOSSP. In all experimental scenarios, the evALNS with LLM-generated destroy operators achieved an average profit increase of 8.48% compared to the ALNS with manually designed destroy operators. This integration underscores the transformative potential of LLMs in designing HAs, indicating that evALNS not only enhances traditional ALNS but also uses LLMs to innovate and advance HAs for the AEOSSP. The success of this methodology highlights the critical role of adaptive design in improving algorithmic performance and the potential that LLMs have in broadly revolutionizing heuristic algorithm applications.
Furthermore, developing a comprehensive framework like the proposed LLM-DPEC with FNLE, which demonstrates robust generalization capabilities across various scenarios, is a pivotal goal for future research. To guide this direction, it is important to consider its broader applicability and limitations. Our methodology is best suited for complex combinatorial optimization problems that are amenable to metaheuristic approaches with decomposable operators (e.g., VRP, JSS), possess a well-defined solution structure, and feature intricate constraints. The co-evolutionary nature of the DPEC framework performs optimally when a baseline of expert-designed heuristics is available to guide the LLM. This positions our knowledge-fusion approach as a distinct alternative to other emerging paradigms, such as the self-adaptive prompt evolution proposed by (Ye et al., 2025), particularly for domains where expert knowledge is crucial but not inherently present in the LLM’s pre-training data.
However, realizing this potential requires addressing inherent limitations, including the complexity of prompt engineering, the need for a robust sandbox to validate LLM-generated code, the computational overhead of the evolutionary discovery process, and the reliance on pre-existing domain knowledge to seed the expert population.
Our approach emphasizes the importance of incorporating advanced AI techniques to significantly improve optimization outcomes. By focusing on the creation of specialized destroy operators, we aim to extend the capabilities of ALNS, ultimately facilitating more effective solutions in complex scheduling problem domains. Future work could explore reducing the dependency on expert knowledge and automating prompt refinement to enhance the framework’s autonomy and broaden its applicability to a wider range of optimization challenges.
6 Appendix A
6.1 A.1 Details of initialization prompt
Figure A1 displays the detailed prompt we designed for generating the initial population
We selected one of the most effective destroy operators evolved by our LLM-DPEC framework for a detailed analysis. This heuristic demonstrates a sophisticated, multi-faceted logic that surpasses typical human-designed operators. The full Python code for this heuristic is provided in Fig. A2.
The procedure of this evolved destroy operator is formally outlined in Algorithm A1.
Definitions of the relevant symbols used in the algorithm and analysis are provided in Table A1.
The heuristic’s logic is structured into three main phases: multi-criteria evaluation, score aggregation with dynamic weighting, and probabilistic selection.
Phase 1: Multi-criteria evaluation
For each task , the operator computes a vector of seven scores. Each score function quantifies a specific aspect of the task:
1) Priority Score (): A simple decaying priority.
2) VTW Overlap Score (): Measures how much a task’s VTWs conflict with others.
3) Observation Duration (): The task’s observation duration.
4) Attitude Angle Score (): The average attitude maneuver required from other tasks to . Let be the norm of the angle difference.
5) VTW Duration Score (): The total available time for the task.
6) Attitude Stability Score (): Measures attitude variation within a task’s VTWs. Let be the set of attitude angles within a VTW .
7) VTW Feasibility Score (): *The ratio of available VTW duration to required observation time.
Phase 2: Score aggregation with dynamic weighting
This phase combines the individual scores into a single composite score . A key feature is the dynamic weighting scheme, where the weight for each criterion is a linear function of the current plan size :
where and are constants evolved by the LLM. The final score is a linear combination, with representing the normalized score:
Here, criteria in (e.g., feasibility ) increase a task’s destruction likelihood, while those in increase its preservation value.
Phase 3: Probabilistic Selection
The composite score is transformed into a destruction probability via an inverse relationship, ensuring high-scoring tasks are less likely to be removed:
After computing this for all tasks, the probabilities are normalized to form a valid probability distribution:
Finally, tasks are sampled from according to the distribution defined by .
In summary, the mathematical properties of this heuristic are profound. It moves beyond simple rules by implementing:
1) High-Dimensional Feature Space: It operates in a seven-dimensional feature space to evaluate each task.
2) Adaptive Policy: The decision logic, parameterized by , is not static but adapts to the current state of the solution.
3) Sophisticated Trade-offs: The structure of embodies complex trade-offs, such as penalizing tasks with high feasibility to create more promising search opportunities.
4) Controlled Stochasticity: The final selection, governed by the distribution , balances deterministic evaluation with stochastic exploration.
A human engineer would be highly unlikely to manually derive this specific set of seven features, their complex interactions in the formula, and the state-dependent nature of the weights . This operator exemplifies the power of our LLM-DPEC framework to discover non-obvious, mathematically sophisticated, and highly effective heuristics for complex combinatorial optimization problems like the AEOSSP.
Chatterjee A,Tharmarasa R, (2024). Multi-stage optimization framework of satellite scheduling for large areas of interest. Advances in Space Research, 73( 3): 2024–2039
[2]
Chen C,Li L,Du Y,Yao F,Xing L, (2025a). A hybrid learning-assisted multi-parallel algorithm for a large-scale satellite-ground networking optimization problem. Engineering Management, 12( 4): 1157–1174
[3]
Chen M,Du Y,Tang K,Xing L,Chen Y,Chen Y, (2024a). Learning to construct a solution for the agile satellite scheduling problem with time-dependent transition times. IEEE Transactions on Systems, Man, and Cybernetics. Systems, 54( 10): 5949–5963
[4]
Chen X,Gao L,Zhang M,Chen C,Yan S, (2024b). Spectral-spatial adversarial multi-domain synthesis network for cross-scene hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 62: 1–16
[5]
Chen X,Zhang M,Liu Y, (2024c). Target detection with spectral graph contrast clustering assignment and spectral graph transformer in hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing, 62: 1–16
[6]
Chen Y,Zhang H,Li C,Chi B,Chen X,Wu J, (2025b). Large language model empowered smart city mobility. Engineering Management, 12( 1): 201–207
[7]
Ding N,Qin Y,Yang G,Wei F,Yang Z,Su Y,Hu S,Chen Y,Chan C M,Chen W,Yi J,Zhao W,Wang X,Liu Z,Zheng H T,Chen J,Liu Y,Tang J,Li J,Sun M, (2023). Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5( 3): 220–235
[8]
Du Y,Wang L,Xing L,Yan J,Cai M, (2019). Data-driven heuristic assisted memetic algorithm for efficient inter-satellite link scheduling in the BeiDou navigation satellite system. IEEE/CAA Journal of Automatica Sinica, 8( 11): 1800–1816
[9]
Du Y,Wang T,Xin B,Wang L,Chen Y,Xing L, (2020). A data-driven parallel scheduling approach for multiple agile earth observation satellites. IEEE Transactions on Evolutionary Computation, 24( 4): 679–693
[10]
Du Y,Xing L,Chen Y, (2022). Satellite scheduling engine: The intelligent solver for future multi-satellite management. Frontiers of Engineering Management, 9( 4): 683–688
[11]
FanWDingYNingLWangSLiHYinDChuaT SLiQ (2024). A survey on rag meeting llms: Towards retrieval-augmented large language models. In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining: 6491–6501
[12]
GaoYXiongYGaoXJiaKPanJBiYDaiYSunJWangH (2023). Retrieval-augmented generation for large language models: A survey. Preprint at arXiv. arXiv:2312.10997
[13]
Han C,Gu Y,Wu G,Wang X, (2023). Simulated annealing-based heuristic for multiple agile satellites scheduling under cloud coverage uncertainty. IEEE Transactions on Systems, Man, and Cybernetics. Systems, 53( 5): 2863–2874
[14]
He L,De Weerdt M,Yorke-Smith N, (2020). Time/sequence-dependent scheduling: the design and evaluation of a general purpose tabu-based adaptive large neighbourhood search algorithm. Journal of Intelligent Manufacturing, 31( 4): 1051–1078
[15]
He L,Liu X,Laporte G,Chen Y,Chen Y, (2018). An improved adaptive large neighborhood search algorithm for multiple agile satellites scheduling. Computers & Operations Research, 100: 12–25
[16]
Herrmann A,Schaub H, (2023). Reinforcement learning for the agile earth-observing satellite scheduling problem. IEEE Transactions on Aerospace and Electronic Systems, 59( 5): 5235–5247
[17]
Jia S,Deng L,Zhao Q,Chen Y, (2023). An adaptive large neighborhood search heuristic for multi-commodity two-echelon vehicle routing problem with satellite synchronization. Journal of Industrial and Management Optimization, 19( 2): 1187–1210
[18]
KaddourJHarrisJMozesMBradleyHRaileanuRMcHardyR (2023). Challenges and applications of large language models. Preprint at arXiv. arXiv:2307.10169
[19]
Kasneci E,Sessler K,Küchemann S,Bannert M,Dementieva D,Fischer F,Gasser U,Groh G,Günnemann S,Hüllermeier E,Krusche S,Kutyniok G,Michaeli T,Nerdel C,Pfeffer J,Poquet O,Sailer M,Schmidt A,Seidel T,Stadler M,Weller J,Kuhn J,Kasneci G, (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103: 102274
[20]
Li R,Gong W,Wang L,Lu C,Zhuang X, (2023). Surprisingly popular-based adaptive memetic algorithm for energy-efficient distributed flexible job shop scheduling. IEEE Transactions on Cybernetics, 53( 12): 8013–8023
[21]
Li R,Wang L,Gong W,Ming F, (2025). An evolutionary multitasking memetic algorithm for multiobjective distributed heterogeneous welding flow shop scheduling. IEEE Transactions on Evolutionary Computation, 29( 6): 2287–2298
[22]
LiuFLiuYShiLHuangHWangRYangZZhangLLiZMaY (2024a). Exploring and evaluating hallucinations in LLM-powered code generation. Preprint at arXiv. arXiv:2404.00971
[23]
LiuFTongXYuanMLinXLuoFWangZLuZZhangQ (2024b). Evolution of heuristics: towards efficient automatic algorithm design using large language lodel. In: Forty-first International Conference on Machine Learning. PMLR 235, 32201–32223
[24]
LiuFTongXYuanMZhangQ (2023a). Algorithm evolution using large language model. Preprint at arXiv. arXiv:2311.15249
[25]
LiuSChenCQuXTangKOngY S (2024c). Large language models as evolutionary optimizers. In: 2024 IEEE Congress on Evolutionary Computation (CEC), 1–8
[26]
Liu X,Laporte G,Chen Y,He R, (2017). An adaptive large neighborhood search metaheuristic for agile satellite scheduling with time-dependent transition time. Computers & Operations Research, 86: 41–53
[27]
Liu Y,Roberto B,Zhou J,Yu Y,Zhang Y,Sun W, (2023b). Efficient feasibility checks and an adaptive large neighborhood search algorithm for the time-dependent green vehicle routing problem with time windows. European Journal of Operational Research, 310( 1): 133–155
[28]
Liu Z,Liu J,Liu X,Yang W,Wu J,Chen Y, (2024d). Knowledge-assisted adaptive large neighbourhood search algorithm for the satellite–ground link scheduling problem. Computers & Industrial Engineering, 192: 110219
[29]
Ou J,Xing L,Yao F,Li M,Lv J,He Y,Song Y,Wu J,Zhang G, (2023). Deep reinforcement learning method for satellite range scheduling problem. Swarm and Evolutionary Computation, 77: 101233
[30]
Pan Z,Wang L,Dong C,Chen J F, (2024). A knowledge-guided end-to-end optimization framework based on reinforcement learning for flow shop scheduling. IEEE Transactions on Industrial Informatics, 20( 2): 1853–1861
[31]
Pan Z,Wang L,Wang J,Zhang Q, (2025). A bi-learning evolutionary algorithm for transportation-constrained and distributed energy-efficient flexible scheduling. IEEE Transactions on Evolutionary Computation, 29( 1): 232–246
[32]
Peng G,Song G,Xing L,Gunawan A,Vansteenwegen P, (2020). An exact algorithm for agile earth observation satellite scheduling with time-dependent profits. Computers & Operations Research, 120: 104946
[33]
PluhacekMKazikovaAKadavyTViktorinASenkerikR (2023). Leveraging large language models for the generation of novel metaheuristic optimization algorithms. In: Proceedings of the Companion Conference on Genetic and Evolutionary Computation, 1812–1820
[34]
Kandepi R,Saini H,George R K,Konduri S,Karidhal R, (2024). Agile earth observation satellite constellations scheduling for large area target imaging using heuristic search. Acta Astronautica, 219: 670–677
[35]
Song Y,Ou J,Suganthan P N,Pedrycz W,Yang Q,Xing L, (2023). Learning adaptive genetic algorithm for earth electromagnetic satellite scheduling. IEEE Transactions on Aerospace and Electronic Systems, 59( 6): 9010–9025
[36]
Song Y,Suganthan P N,Pedrycz W,Yan R,Fan D,Zhang Y, (2024). Energy-efficient satellite range scheduling using a reinforcement learning-based memetic algorithm. IEEE Transactions on Aerospace and Electronic Systems, 60( 4): 4073–4087
[37]
Thirunavukarasu A J,Ting D S J,Elangovan K,Gutierrez L,Tan T F,Ting D S W, (2023). Large language models in medicine. Nature Medicine, 29( 8): 1930–1940
[38]
VaithilingamPZhangTGlassmanE L (2022). Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In: Chi Conference on Human Factors in Computing Systems, Extended Abstracts. New Orleans: ACM, 2022: 1-7
[39]
Stein N,Bäck T, (2025). Llamea: A large language model evolutionary algorithm for automatically generating metaheuristics. IEEE Transactions on Evolutionary Computation, 29( 2): 331–345
[40]
Wang X,Wang L,Dong C,Ren H,Xing K, (2024). Reinforcement learning-based dynamic order recommendation for on-demand food delivery. Tsinghua Science and Technology, 29( 2): 356–367
[41]
Wang X,Wu G,Xing L,Pedrycz W, (2021). Agile earth observation satellite scheduling over 20 years: Formulations, methods, and future directions. IEEE Systems Journal, 15( 3): 3881–3892
[42]
Wong T T,Yeh P Y, (2020). Reliable accuracy estimates from k-Fold cross validation. IEEE Transactions on Knowledge and Data Engineering, 32( 8): 1586–1594
[43]
Wu G,Xiang Z,Wang Y,Gu Y,Pedrycz W, (2024a). Improved adaptiv large neighborhood search algorithm based on the two-stage framework for scheduling multiple super-agile satellites. IEEE Transactions on Aerospace and Electronic Systems, 60( 5): 7185–7200
Wu Y,Wang L,Chen J,Zheng J,Pan Z, (2024b). A reinforcement learning driven two-stage evolutionary optimisation for hybrid seru system scheduling with worker transfer. International Journal of Production Research, 62( 11): 3952–3971
[47]
Xhafa F,Sun J,Barolli A,Biberaj A,Barolli L, (2012). Genetic algorithms for satellite scheduling problems. Mobile Information Systems, 8( 4): 351–377
[48]
Yang C, (2021). Innovation and development of BeiDou Navigation Satellite System (BDS) project management mode. Engineering Management, 8( 2): 312–320
[49]
Yao F,Du Y,Li L,Xing L,Chen Y, (2023). General modeling and optimization technique for real-world earth observation satellite scheduling. Engineering Management, 10( 4): 695–709
[50]
YeHXuHYanAChengY (2025). Large language model-driven large neighborhood search for large-scale MILP problems. In: 42nd International Conference on Machine Learning, Vienna, Austria, July 14-20, 2025
[51]
ZhangM RDesaiNBaeJLorraineJBaJ (2023). Using large language models for hyperparameter optimization. In: NeurIPS 2023 Foundation Models for Decision Making, New Orleans, LA, USA, December 10-16, 2023
[52]
ZhaoW XZhouKLiJTangTWangXHouYMinYZhangBZhangJDongZDuYYangCChenYChenZJiangJRenRLiYTangXLiuZLiuPNieJ YWenJ R (2023). A survey of large language models. Preprint at arXiv. arXiv:2303.18223