1 Introduction
On-device deep learning (DL) on mobile and embedded IoT devices drives various applications [
1] like robotics image recognition [
2] and drone swarm classification [
3]. Efficient local data processing preserves privacy, enhances responsiveness, and saves bandwidth. However, current on-device DL relies on
predefined patterns, leading to accuracy and efficiency bottlenecks. It is difficult to provide feedback on
data processing performance during the
data acquisition stage, as processing typically occurs after data acquisition.
Harnessing the potential of swarms formed by physically adjacent mobile and embedded devices in IoT scenarios, we advocate a paradigm shift towards
Swarm DL by drawing inspiration from
swarm intelligence [
4]. This shift entails moving from
reactive on-device DL, which responds to given input data, to
proactive systems known as swarm deep learning (Swarm DL). Conceptually,
swarm intelligence involves the collective intelligent behavior of multiple agents, each acting
proactively based on simple patterns in a self-organized, self-adaptive, self-evolved manner, leading to enhanced global performance. Building upon on-device DL as the foundation, Swarm DL
proactively scales data acquisition and processing and provides
bi-directional optimization feedback for them, forming a closed system loop. This paradigm aims to
maximize implicit complementarity and
minimize redundancy in both data acquisition and processing within the swarm, fostering a more efficient and scalable IoT system.
Specifically, Swarm DL achieves swarm intelligence by utilizing IoT devices equipped with advanced sensors and DL computing capabilities as active agents, with two major visions: Proactive Data Acquisition for DL and Proactive Data Processing by DL. To achieve these two visions, it is necessary to expand current research, facing two challenges: (i) from independent data collection and processing optimization to bidirectional optimization. (ii) from reactive data processing to active data collection and processing.
To handle these challenges in practical IoT environments, we propose a generic system framework, named DeepSwarm. We define modular design for DeepSwarm and identify optimization opportunities and techniques to deploy Swarm DL on resource-limited and decentralized mobile and embedded platforms. DeepSwarm pinpoints a set of proactive strategies inter- and intra-devices that contribute to a self-organized, self-adaptive, and self-evolving Swarm DL system.
2 Scope and framework
In this section, we introduce Swarm DL and a general framework, DeepSwarm, comprising two functional modules.
2.1 The concept of Swarm DL
As mentioned above, Swarm DL extends reactive on-device DL, which focuses on resource-efficient DL given input data on individual IoT devices, to a distributed setting inspired by proactive swarm intelligence. As shown in Fig.1, compared with related concepts, Swarm DL exhibits unique characteristics of self-organization, self-adaptive, and self-evolving.
Fig.1 Comparison of Swarm DL and related concepts |
Full size|PPT slide
● Self-organized Swarm DL emphasizes the bottom-up emergence of collective behaviors among devices. Each agent achieves dynamic optimization of global system performance and resource efficiency by proactively triggering local operations for data acquisition and processing.
● Self-adaptive Each agent in Swarm DL exhibits greater pro-activity, participating in both DL inference on the device and adaptation, with a higher proportion of agents dedicated to adaptation.
● Self-evolving Swarm DL is real-time and human independent, consisting of mobile agents that actively learn from data through bidirectional feedback, optimizing data collection (supply) and processing (demand) for each device.
2.2 The DeepSwarm framework
To realize the vision of Swarm DL, we present a generic system framework, DeepSwarm (see Fig.2). DeepSwarm takes a modular design and decomposes the requirements of the Swarm DL into two modules: proactive swarm data acquisition for DL and proactive swarm data processing by DL. These two modules work in synergy with bi-directional feedback to optimize the system performance (e.g., accuracy, latency) and resource efficiency (energy efficiency, memory fragmentation). It functions with heterogeneous AIoT hardware (e.g., CPU, GPU, or MCU-equipped embedded devices), adapts to dynamic application contexts (i.e., data distribution drifts and runtime resource availability), and generalizes to various AIoT applications (e.g., mobile health, smart homes, autonomous vehicles, industrial automation).
Fig.2 Illustration of DeepSwarm, a generic system framework to realize bio-directional optimization of Swarm DL |
Full size|PPT slide
2.3 Proactive swarm data acquisition module
This module coordinates the self-organization of distributed agents and sensors, drawing inspiration from swarm intelligence. Each agent actively engages in data resampling, sensing parameter adjustment, and association with other agents by analyzing information extracted from cross-modal, cross-task, and cross-clock sensor data, aiming to maximize fusion quality and minimize redundancy of agent data. Additionally, it aims to achieve complementary enhancement as early as possible, addressing challenges such as modal information loss, clock asynchrony, and data shifts. This mitigates subsequent resource costs (e.g., sampling, computing, transmission bandwidth) as early as possible. Specifically, we emphasize the simultaneous assessment of the explicit and implicit importance of data on the performance of subsequent data processing tasks at runtime. The explicit data importance estimation is data-driven and has abundant existing works. While implicit data complementarity profiling is system-driven and non-trivial, requiring a comprehensive consideration of dynamic system factors.
2.4 Proactive swarm data processing module
This module encompasses data processing tasks, including DL inference and adaptation. Traditional on-device DL systems primarily focus on inference, especially in task-specific embedded devices. In contrast, Swarm DL agents exhibit higher proactivity, engaging in both inference and adaptation, even with a greater emphasis on adaptation. Balancing resource allocation for DL inference and adaptation presents a novel challenge in this context. Specifically, the primary objective in Swarm DL is to perform data processing tasks in a self-adaptive and self-evolutionary manner, akin to general swarm intelligence. Technically, unlike approaches solely optimizing DL algorithms, DeepSwarm also addresses system asynchrony in resource competition and varied resource availability among agents, managing peak memory usage, and optimizing system delay and accuracy tradeoffs.
3 Challengs and opportunities
Embracing the co-design principle, we identify the following challenges and opportunities:
● Self-adaptive Non-blocking DL Inference with misaligned, incomplete, and asynchrony data. Existing approaches relying on statistical analysis or divergence measures can not work well in real-time for local agents without access to global data. Challenges include identifying non-redundant data correlation, runtime sampling rate adaptation, and dynamically elastic DL model.
● Test-time Self-evolutionary DL Adaptation. The asynchronism of distributed multi-modal data streams poses challenges, leading to system delays (if waiting for slow devices) or a decrease in accuracy (if not waiting for slow data). Multi-task co-adaptation via data and computing reuse is an opportunity yet a challenge for agents lacking data and computational resources.
● Swarm DL-adaptive Adaptive Operator/System Resource Scheduling. Tailoring the system scheduling to characteristics of swarm data and tasks, such as tensor/operator life cycles and dependencies, enhances parallelism, increases data reuse, and reduces memory fragmentation during swarm data processing.
4 Experiment result and analysis
Mobile video applications today have attracted significant attention. The compressed DL model is widely used to enable on-device video inference, which, however, is vulnerable to the
non-stationary data drift of the live video captured from dynamically changing mobile environments. To combat data drifts, present a
Swarm DL adaptation system based on the DeepSwarm framework, which enables each agent to continuously update using newly collected sensor data from local and other agents The system consists of three components:
data drift-aware video frame sampling,
feedback-aware DL adaption trigger, and
adaptive DL adaptation & resource scheduling. See more details in the supplementary material. We compare our system with four baselines, i.e., domain adaptation [
5], DeepSwarm without data fusion from other agents, original agent DL model, and single-agent adaptation [
6]. Tab.1 demonstrates that DeepSwarm achieves the best overall performance in terms of accuracy, enhancing average accuracy by over 40% compared to the original models for the mobile model and by 9% for the global model.
Tab.1 Performance comparison of different DL model adaptation methods |
Method | Accuracy gain of global model | Accuracy of mobile model after adaptation | | Accuracy gain of mobile model |
IoU = 0.50 | IoU = 0.50 | | IoU = 0.50 |
Mobile model A | Mobile model B | Mobile model C | Average | | Mobile model A | Mobile model B | Mobile model C | Average |
Domain adaptation | None | 0.504 | 0.469 | 0.497 | 0.49 | | 14.3% | 48.9% | 13.7% | 23.4% |
NestEvo without data generation | 1.3% | 0.504 | 0.475 | 0.501 | 0.505 | | 14.3% | 50.8% | 15.5% | 27.2% |
Original mobile model | None | 0.441 | 0.315 | 0.437 | 0.397 | | None |
Only mobile model adaptation | 0 | 0.501 | 0.478 | 0.493 | 0.491 | | 13.6% | 51.7% | 12.8% | 23.7% |
NestEvo | 9.13% | 0.571 | 0.543 | 0.584 | 0.566 | | 29.5% | 72.4% | 33.6% | 42.6% |
5 Conclusion
Inspired by the collective intelligence observed in natural swarms, where individual proactive actions contribute to superior global performance, we advocate for a shift towards Swarm DL. By harnessing the potential of physically adjacent mobile devices in IoT scenarios, we present DeepSwarm, a closed-loop system framework architecture. DeepSwarm facilitates bidirectional optimization between data acquisition and processing, aiming to push the performance boundaries of on-device DL Specifically, DeepSwarm addresses the requirements of proactive Swarm DL by decomposing them into layers: self-organized swarm data acquisition and self-adaptive, self-evolutionary swarm data processing.
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}