HarmonyOS is emerging as a popular distributed operating system for diverse mobile devices. One of its standout features is app-hopping, which allows users to switch apps seamlessly across different HarmonyOS devices. However, when apps play audio-stream-hop between different devices, they can easily trigger Hopping-related Audio-stream Conflict () scenarios. Improper resolution of will lead to significant issues, which are hard to detect comprehensively due to the unclear semantics of HarmonyOS’s app-hopping mechanism and the lack of effective multi-app hopping testing methods. To fill the gap, this paper introduces an automated and efficient approach to detecting issues. We formalize the operational semantics of HarmonyOS’s app-hopping mechanism for audio streams for the first time. Leveraging this formalization, we design an Audio-stream-aware State Transition Graph () to model the behaviors of audio-streams during window transitions and propose a model-based approach to detect issues automatically. Our techniques are implemented in a tool, , and evaluated on 20 real-world HarmonyOS apps. Experimental results reveal that 12 of the 20 apps exhibit issues. Among the 53 issues detected, a total of 18 unique issues are manually confirmed. Additionally, we summarize the detected issues into two typical types, namely and , and analyz their characteristics to assist and guide both app and OS developers.
Software testing is a crucial phase in the software life cycle, helping identify potential risks and reduce maintenance costs. With the advancement of Large Language Models (LLMs), researchers have proposed an increasing number of LLM-based software testing techniques, particularly in the area of test case generation. Despite the growing interest, limited efforts have been made to thoroughly evaluate the actual capabilities of LLMs in this task. In this paper, we introduce TestBench, a benchmark for class-level LLM-based test case generation. We construct a dataset of 108 Java programs from nine real-world, large-scale projects on GitHub, each representing a different thematic domain. We then design three distinct types of prompts based on context descriptions, including self-contained context, full context, and simple context. Besides, we propose a fine-grained evaluation framework that considers five aspects of test cases: syntactic correctness, compilation correctness, test correctness, code coverage rate, and defect detection rate. Furthermore, we propose a heuristic algorithm to repair erroneous test cases generated by LLMs. We evaluate CodeLlama-13b, GPT-3.5, and GPT-4 on the TestBench, and our experimental results indicate that larger models demonstrate a greater ability to effectively utilize contextual information, leading to generate higher-quality test cases. Smaller models may struggle with the noise introduced by the extensive information contained within the full context. However, when using the simplified version, namely the simple context, which is derived from the full context via abstract syntax tree analysis, the performance of these models improves significantly. Our analysis highlights the current progress and pinpoints future directions to further enhance the effectiveness of models by handling contextual information for test case generation.
Modern software often integrates assembly code into C programs for performance-critical tasks, such as big integer computation in the GMP library and cryptographic algorithms in the OpenSSL library. Though the formal semantics of C and assembly are well-studied, linking these languages for realistic compiler verification remains challenging due to their abstraction gap. In this article, we address the problem of linking C with assembly by introducing a denotation-based framework and we apply this framework to support verified compositional compilation (VCC) for C and x86 assembly in CompCert. Specifically, we propose a novel semantic transformation operator that systematically interprets the denotational semantics of assembly into that of C. This semantic transformation bridges the interaction differences between the two languages, enabling their semantic linking to be effectively implemented under a unified language interface. Furthermore, we demonstrate the soundness of such semantic linking by applying it to the context of VCC, where two key properties are proved: 1) the transformed C semantics is faithfully preserved by the original assembly semantics, and 2) this preservation is compositional with the compilation correctness of other open modules. As a result, our approach advances the use of denotational semantics for VCC when both C and assembly are involved. All results in this article are formally verified in the Coq proof assistant.
Model-based diagnosis (MBD) is a principled approach for identifying the possible causes of unexpected behavior caused by system malfunctions and is crucial for enhancing the reliability of modern intelligent systems. Previous MBD methods often generate numerous candidate diagnoses without ensuring their accuracy in identifying fault components. In this paper, we introduce a novel Enhanced Model-Based Diagnosis (EMBD) method, which iteratively refines candidate component sets through iterative applications of our fault localization model and our proposed local search MaxSAT algorithm, achieving precise identification of faulty components. Firstly, we establish a formal fault localization model using Weighted Conjunctive Normal Form (WCNF). The model encodes the fault localization problem into a MaxSAT problem by duplicating the original circuit and inserting XOR gates between corresponding gate pairs to enable output comparison and discrepancy detection. Secondly, building upon this model, we develop NuFPS, a local search MaxSAT algorithm that identifies the maximum number of components with output logic variations across different observations, effectively pruning fault-free components from the candidate diagnoses. Experimental evaluations demonstrate that EMBD achieves significantly more accurate candidate diagnoses. Compared with state-of-the-art methods such as HSD, CMMO, DiagDO, and DPDN, EMBD achieves significant improvements, with 673.6%, 254.2%, 211.5%, and 283.2% increase in diagnosis performance, respectively.