Training large-scale deep neural networks (DNNs) is prone to software and hardware failures, with critical failures often requiring full-machine reboots that substantially prolong training. Existing checkpoint-recovery solutions either cannot tolerate such critical failures or suffer from slow checkpointing and recovery due to constrained input/output bandwidth. In this paper, we propose FastCheck, a checkpoint-recovery framework that accelerates checkpointing and recovery through parallel transmission and tailored compression. First, FastCheck partitions checkpoints into shards and leverages multiple nodes for parallel checkpointing and recovery. Second, it further reduces checkpoint size and overhead with delta compression for weights and index compression for momentum. Third, FastCheck employs lightweight and consistent health status maintenance that accurately tracks node health, preventing checkpoint transmission to failed nodes. We implement FastCheck in PyTorch and evaluate it on multiple DNN models against two baselines. Experimental results show that FastCheck reduces the checkpointing time by up to 78.42% and the recovery time by up to 77.41%, while consistently improving efficiency across different training stages.
Humanoid robotics represents a rapidly evolving research domain that integrates artificial intelligence and robotics. Despite significant advances, existing reviews have predominantly focused on narrow technical aspects and lack comprehensive analysis from academic and industrial perspectives. This paper presents a systematic dual-perspective survey, in which academic literature, commercial products, and industry reports are extensively analyzed. A comprehensive taxonomic framework and systematic review of key enabling technologies are established, including ontological structures, perception systems, locomotion control, intelligent decision-making algorithms, foundation model integration, and human-robot interaction (HRI) technologies. From academic and industrial perspectives, research progress across diverse applications is examined, and a detailed comparative analysis of commercial products from leading companies, including Tesla, Boston Dynamics, and UBTECH, is performed. Six major challenge categories are identified: hardware design limitations, control system complexities, perception constraints, HRI difficulties, application-specific requirements, and ethical considerations. In addition, the transformative impact and integration challenges of large language models are particularly discussed. Seven promising research directions are outlined, and a systematic academic-industrial gap analysis is conducted. Consequently, significant disparities and technology transfer bottlenecks are identified, and successful collaboration models are examined. This comprehensive survey provides the first systematic examination combining academic research insights with industrial development analysis. It thus offers valuable guidance for researchers, engineers, and policymakers working toward more capable, affordable, and socially integrated humanoid robots.
Accurate tool wear prediction is crucial for manufacturing efficiency, yet effectively using multi-domain sensor features is difficult due to redundant noise. There is a critical need to strategically leverage highly predictive strong features and potentially informative weak features. To address this issue, we propose CdualTAL, an improved Transformer-based encoder-attention-decoder algorithm. Its name represents the model’s key components: a correlation-adaptive feature selection algorithm module, a dual-channel Transformer encoder, an attention mechanism, and a long short-term memory (LSTM) decoder. CdualTAL employs a dual-channel encoder to independently process the full set of multi-domain features, along with a subset of strong features selected using a designed correlation-adaptive feature selection algorithm. A custom cross-attention mechanism is then used to fuse these representations, sharpening focus on strong features while judiciously integrating information from weak ones. Finally, a hierarchical LSTM decoder captures deep temporal dependencies. Validated on tool wear datasets, CdualTAL outperforms 11 state-of-the-art methods, achieving superior prediction stability and accuracy with an average R2 of 0.983 and a root mean square error (RMSE) of 4.373.
A dual-band filtering push-pull power amplifier (PA) with a large frequency ratio is presented in this paper. The proposed filtering power dividing/combining network is based on a hybrid-mode filtering balun using microstrip line (MSL) and substrate integrated waveguide (SIW). The MSL filtering balun operates in the S-band, with a frequency range of 2.6-2.86 GHz. Meanwhile, the SIW filtering balun is designed for Ku-band operation, covering a frequency range of 13-13.65 GHz. Under these conditions, the prototype is capable of attaining a frequency ratio as high as five times the original value. Due to the inherent differential characteristic of the hybrid-mode filtering balun with a large frequency ratio, the proposed push-pull PA not only realizes filtering functionality but also achieves second-harmonic suppression. To validate the designed concept, the proposed prototype has been designed, fabricated, and measured. Measurement results demonstrate that the proposed PA achieves a 7 dB small-signal gain while maintaining out-of-band spurious rejection during active testing. The developed dual-band filtering push-pull PA delivers excellent performance, with a peak output power of 36.8 dBm at low frequencies and 36 dBm at high frequencies. Moreover, by employing dual-band filtering baluns, the PA inherently suppresses even-order harmonics while simultaneously providing filtering characteristics in both operational bands, which effectively suppresses near-band spurious signals.
NAND flash-based solid-state drives (SSDs) have been adopted by many data centers due to their high performance and low power consumption. However, the physical characteristics of the underlying flash memory necessitate garbage collection (GC) operations. Valid page migration during GC contributes significantly to latency overhead while competing for flash channel bandwidth and controller resources with user I/O requests through shared physical paths, leading to path conflicts and elevated long-tail latency. The existing Venice scheme introduces a low-cost interconnected network with path reservation mechanisms to provide substantial path diversity for SSDs. Nevertheless, its fair scheduling policy lacks priority differentiation between I/O and GC requests. In this paper, we propose GC bypass, which leverages Venice’s path diversity while enforcing GC request transmission through dedicated controllers. GC bypass decomposes GC requests into sub-requests and assigns low priority to valid page writes, enabling high-priority operations including user I/O, valid page reads, and block erases, to preempt paths reserved by low-priority requests. Valid pages failing to secure reserved paths are temporarily buffered for retry. Experimental results demonstrate that GC bypass reduces the 99.99th percentile long-tail latency by up to 25% compared to Venice. GC bypass effectively mitigates interference between critical I/O operations and background maintenance tasks while maintaining the architectural benefits of path diversity.
Due to the complex and changeable marine environment, the active sonar target recognition problem has always been difficult in the field of underwater acoustics. Deep learning-based fusion recognition technology provides an effective way to solve this problem, but relying on simple concatenation strategies to fuse multi-domain features can cause information redundancy, and it is not easy to effectively mine correlation information between domains. Therefore, this paper proposes an attention mechanism-based multi-domain feature fusion approach for active sonar target recognition. By preprocessing active sonar echo signals and constructing a multi-domain feature extraction and fusion network, this method uses a one-dimensional convolutional neural network with long short-term memory (1DCNN-LSTM) and a two-dimensional convolutional neural network (2DCNN) with channel attention introduced to extract deep features from different domains. Subsequently, combining feature concatenation and constructing multi-domain cross-attention, intra- and cross-domain feature fusion is performed, which can effectively eliminate redundant information and promote inter-domain information interaction, while maximizing the retention of target features. Experimental results show that compared with single-domain methods, the network using an attention mechanism for multi-domain feature fusion strengthens cross-domain information interaction and significantly improves feature representation capability. Compared with other methods, the proposed method has obvious advantages in performance and maintains stable generalization ability in scenarios with low signal-clutter ratios.