Large models based high-fidelity voice services over 6G narrowband non-terrestrial networks

Jingyuan Han , Chengxiao Yu , Gang Liu , Shijing Yuan , Zhongkai Tong

›› 2025, Vol. 11 ›› Issue (6) : 1864 -1873.

PDF
›› 2025, Vol. 11 ›› Issue (6) :1864 -1873. DOI: 10.1016/j.dcan.2025.07.009
Regular Papers
research-article

Large models based high-fidelity voice services over 6G narrowband non-terrestrial networks

Author information +
History +
PDF

Abstract

Non-Terrestrial Networks (NTN) can be used to provide emergency voice services in Sixth-Generation (6G) communication systems. However, Internet of Things (IoT) terminals, which comprise restricted bandwidth resources and weak computing power, which make ensuring high-quality voice services over NTN challenging. Recent advancements in Artificial Intelligence (AI) techniques have been increasingly applied to enhance the audio quality and reduce the bit rate. However, applying models with high computational complexity to IoT terminals is difficult. In this study, we propose a voice-services-over NTN solution including a novel 6G non-terrestrial and ground network integrated framework and a lightweight Large Models (LMs)-driven codec operating at 450 bits per second. We also designed a new voice packet header and deployed an agent on-ground gateway to reduce the bandwidth overhead. The non-standard Session Initiation Protocol header was converted to the standard format while re-encapsulating Internet Protocol and User Datagram Protocol headers, replacing the conventional implementations. Additionally, an operational NTN satellite was used to evaluate the proposed ReCodec. The experimental results demonstrate that the ReCodec decreases the computational complexity by 96.61% while increasing the voice quality by 17.55% when compared with the state-of-the-art mechanisms. Furthermore, the design of the packet header reduced the voice frame header to 50 bytes.

Keywords

NTN / Voice services / LMs / Satellite

Cite this article

Download citation ▾
Jingyuan Han, Chengxiao Yu, Gang Liu, Shijing Yuan, Zhongkai Tong. Large models based high-fidelity voice services over 6G narrowband non-terrestrial networks. , 2025, 11(6): 1864-1873 DOI:10.1016/j.dcan.2025.07.009

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

G. Geraci, D. López-Pérez, M. Benzaghta, S. Chatzinotas, Integrating terrestrial and non-terrestrial networks: 3D opportunities and challenges, IEEE Commun. Mag. 61 (4) (2022) 42-48.

[2]

M. Wang, T. Zhu, T. Zhang, J. Zhang, S. Yu, W. Zhou, Security and privacy in 6G net-works: new areas and new challenges, Digit. Commun. Netw. 6 (3) (2020) 281-291.

[3]

O.T.H. Alzubaidi, M.N. Hindia, K. Dimyati, K.A. Noordin, A.N.A. Wahab, F. Qamar, R. Hassan,Interference challenges and management in B5G network design: a com-prehensive review, Electronics 11 (18) (2022) 2842.

[4]

A. Vanelli-Coralli, A. Guidotti, T. Foggi, G. Colavolpe, G. Montorsi, 5G and beyond 5G non-terrestrial networks: trends and research challenges, in: 2020 IEEE 3rd 5G World Forum (5GWF), IEEE, 2020, pp. 163-169.

[5]

M. Banafaa, I. Shayea, J. Din, M.H. Azmi, A. Alashbi, Y.I. Daradkeh, A. Alham-madi, 6G mobile communication technology: requirements, targets, applications, challenges, advantages, and opportunities, Alex. Eng. J. 64 (2023) 245-274.

[6]

B. Barakat, A. Taha, R. Samson, A. Steponenaite, S. Ansari, P.M. Langdon, I.J. Was-sell, Q.H. Abbasi, M.A. Imran, S. Keates,6G opportunities arising from Internet of things use cases: a review paper, Future Internet 13 (6) (2021) 159.

[7]

X. Lv, S. Rani, S. Manimurugan, A. Slowik, Y. Feng, Quantum-inspired sensitive data measurement and secure transmission in 5g-enabled healthcare systems, Tsinghua Sci. Technol. 30 (1) (2024) 456-478.

[8]

S. Yuan, Y. Liu, S. Guo, J. Li, H. Chen, C. Wu, Y. Yang, Efficient online computing offloading for budget-constrained cloud-edge collaborative video streaming systems, IEEE Trans. Cloud Comput. 13 (1) (2025) 273-287.

[9]

S. Yuan, J. Li, H. Chen, Z. Han, C. Wu, Y. Zhang, Jira: joint incentive design and resource allocation for edge-based real-time video streaming systems, IEEE Trans. Wirel. Commun. 22 (5) (2023) 2901-2916.

[10]

S. Yuan, J. Li, C. Wu, Jora: blockchain-based efficient joint computing offloading and resource allocation for edge video streaming systems, J. Syst. Archit. 133 (2022) 102740.

[11]

3GPP TS 38.300, NR; NR and NG-RAN overall description; Stage 2, v. 16.8.0.

[12]

A.E. Ertan, T.P. Barnwell,Improving the 2.4 kb/s military standard-MELP (MS-MELP) coder using pitch-synchronous analysis and synthesis techniques, in: IEEE ICASSP, 2005, pp. 1-5.

[13]

J.-M. Valin, et al., LPCNet: improving neural speech synthesis through linear predic-tion, in: IEEE ICASSP, 2019, pp. 5891-5895.

[14]

N.N. Edan, N.N. Khamiss, High quality low bitrate voice codec for transmission over advanced lte, Iraqi J. Inf. Commun. Technol. 1 (1) (2021) 83-93.

[15]

Codec2 Website [Online]. Available: http://www.rowetel.com.

[16]

A. Rämö, Voice quality evaluation of various codecs, in: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, pp. 4662-4665.

[17]

T. Jenrungrot, M. Chinen, W.B. Kleijn, J. Skoglund, Z. Borsos, N. Zeghidour, M. Tagliasacchi,Lmcodec: a low bitrate speech codec with causal transformer models, in: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Sig-nal Processing (ICASSP), IEEE, 2023, pp. 1-5.

[18]

Z. Zhao, S. Elshamy, H. Liu, T. Fingscheidt, A CNN postprocessor to enhance coded speech, in: 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), IEEE, 2018, pp. 406-410.

[19]

Z. Zhao, H. Liu, T. Fingscheidt, Convolutional neural networks to enhance coded speech, IEEE/ACM Trans. Audio Speech Lang. Process. 27 (4) (2018) 663-678.

[20]

A. Siahkoohi, et al., Ultra-low-bitrate speech coding with pretrained transformers, in: INTERSPEECH, 2022, pp. 4421-4425.

[21]

D. Xin, X. Tan, S. Takamichi,H. Saruwatari, BigCodec: pushing the limits of low-bitrate neural speech codec, arXiv preprint, arXiv:2409.05377.

[22]

S. gil Lee, W. Ping, B. Ginsburg, B. Catanzaro, S. Yoon, BigVGAN: a universal neu-ral vocoder with large-scale training,in: The Eleventh International Conference on Learning Representations, 2023.

[23]

G. Liu, W. Quan, N. Cheng, H. Zhang, X. Shen, VLI: variable-length identifier for in-terconnecting heterogeneous IoT networks, IEEE Wirel. Commun. Lett. 9 (8) (2020) 1146-1149.

[24]

H. Zhang, W. Quan, H.-c. Chao, C. Qiao, Smart identifier network: a collaborative architecture for the future Internet, IEEE Netw. 30 (3) (2016) 46-51.

[25]

D. O’Shaughnessy, Linear predictive coding, IEEE Potentials 7 (1) (2002) 29-32.

[26]

A. McCree, K. Truong, E.B. George, T.P. Barnwell, V. Viswanathan, A 2.4 kbit/s MELP Coder Candidate for the New US Federal Standard, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, vol. 1, IEEE, 1996, pp. 200-203.

[27]

P. Hedelin, A tone oriented voice excited vocoder, in: ICASSP’81. IEEE Interna-tional Conference on Acoustics, Speech, and Signal Processing, vol. 6, IEEE, 1981, pp. 205-208.

[28]

R. McAulay, T. Quatieri, Speech analysis/synthesis based on a sinusoidal represen-tation, IEEE Trans. Acoust. Speech Signal Process. 34 (4) (1986) 744-754.

[29]

B. Bessette, R. Salami, R. Lefebvre, M. Jelinek, J. Rotola-Pukkila, J. Vainio, H. Mikkola, K. Jarvinen, The adaptive multirate wideband speech codec (AMR-WB), IEEE Trans. Speech Audio Process. 10 (8) (2002) 620-636.

[30]

J.-M. Valin, K. Vos, T. Terriberry, Definition of the opus audio codec, Tech. Rep., 2012.

[31]

J. Skoglund, J.-M. Valin, Improving opus low bit rate quality with neural speech synthesis, in: Interspeech, 2020, pp. 2847-2851.

[32]

W.B. Kleijn, F.S. Lim, A. Luebs, J. Skoglund, F. Stimberg, Q. Wang, T.C. Walters, Wavenet based low rate speech coding, in: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2018, pp. 676-680.

[33]

A. Van Den Oord, O. Vinyals, et al., Neural discrete representation learning,in: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 6309-6318.

[34]

C. Gârbacea, A. van den Oord, Y. Li, F.S. Lim, A. Luebs, O. Vinyals, T.C. Walters,Low bit-rate speech coding with VQ-VAE and a WaveNet decoder, in: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 735-739.

[35]

J. Casebeer, V. Vale, U. Isik, J.-M. Valin, R. Giri, A. Krishnaswamy,Enhancing into the codec: noise robust speech coding with vector-quantized autoencoders, in: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2021, pp. 711-715.

[36]

N. Kalchbrenner, E. Elsen, K. Simonyan, S. Noury, N. Casagrande, E. Lockhart, F. Stimberg, A. Oord, S. Dieleman, K. Kavukcuoglu, Efficient neural audio synthesis, in: International Conference on Machine Learning, PMLR, 2018, pp. 2410-2419.

[37]

N. Zeghidour, A. Luebs, A. Omran, J. Skoglund, M. Tagliasacchi, SoundStream: an end-to-end neural audio codec, IEEE/ACM Trans. Audio Speech Lang. Process. 30 (2021) 495-507.

[38]

A. Polyak, Y. Adi, J. Copet, E. Kharitonov, K. Lakhotia, W.-N. Hsu, A. Rahman Mo-hamed, E. Dupoux, Speech resynthesis from discrete disentangled self-supervised representations, in: Interspeech, 2021.

[39]

J. Kong, J. Kim, J. Bae,Hifi-gan: generative adversarial networks for efficient and high fidelity speech synthesis, Adv. Neural Inf. Process. Syst. 33 (2020) 17022-17033.

[40]

A. Biswas, D. Jia,Audio codec enhancement with generative adversarial networks, in: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Sig-nal Processing (ICASSP), IEEE, 2020, pp. 356-360.

[41]

F. Jiang, Y. Peng, L. Dong, K. Wang, K. Yang, C. Pan, D. Niyato, O.A. Dobre, Large language model enhanced multi-agent systems for 6G communications, IEEE Wirel. Commun. 31 (6) (2024) 48-55.

[42]

A.J. Thirunavukarasu, D.S.J. Ting, K. Elangovan, L. Gutierrez, T.F. Tan, D.S.W. Ting, Large language models in medicine, Nat. Med. 29 (8) (2023) 1930-1940.

[43]

M.U. Hadi, R. Qureshi, A. Shah, M. Irfan, A. Zafar, M.B. Shaikh, N. Akhtar, J. Wu, S. Mirjalili, et al. A Survey on Large Language Models: Applications, Challenges, Limitations and Practical Usage, Authorea Preprints, 2023, pp. 1-45.

[44]

S. Yuan, Q. Zhou, J. Li, S. Guo, H. Chen, C. Wu, Y. Yang, Adaptive incentive and resource allocation for blockchain-supported edge video streaming systems: a coop-erative learning approach, IEEE Trans. Mob. Comput. 24 (2) (2025) 539-556.

[45]

Y. Huang, H. Du, X. Zhang, D. Niyato, J. Kang, Z. Xiong, S. Wang, T. Huang, Large language models for networking: applications, enabling techniques and challenges, IEEE Netw. 39 (1) (2025) 235-242.

[46]

D. Wu, X. Wang, Y. Qiao, Z. Wang, J. Jiang, S. Cui, F. Wang,NetLLM: adapting large language models for networking,in:Proceedings of the ACM SIGCOMM 2024 Conference, 2024, pp. 661-678.

[47]

Z. Zhang, L. Dai, Reconfigurable intelligent surfaces for 6g: nine fundamental issues and one critical problem, Tsinghua Sci. Technol. 28 (5) (2023) 929-939.

[48]

X. Wang, A. Shankar, K. Li, B. Parameshachari, J. Lv, Blockchain-enabled decentral-ized edge intelligence for trustworthy 6g consumer electronics, IEEE Trans. Consum. Electron. 70 (1) (2024) 1214-1225.

[49]

B. Rong, H. Rutagemwa, Leveraging large language models for intelligent control of 6G integrated tn-ntn with iot service, IEEE Netw. 38 (4) (2024) 136-142.

[50]

W. Zhang, N. Tang, D. Yang, R. Guo, H. Zhang, X. Shen, Det (com) 2: determinis-tic communication and computation integration toward aigc services, IEEE Wirel. Commun. 31 (3) (2024) 32-41.

[51]

Y. Yang, Q. Zhang, C. Li, D.S. Marta, N. Batool, J. Folkesson,Human-centric autonomous systems with llms for user command reasoning, in:Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 988-994.

[52]

W. Zhang, Y. He, T. Zhang, C. Ying, J. Kang, Intelligent resource adaptation for diversified service requirements in industrial IoT, IEEE Trans. Cogn. Commun. Netw. 11 (4) (2025) 2648-2661.

[53]

R. Zhang, H. Du, Y. Liu, D. Niyato, J. Kang, Z. Xiong, A. Jamalipour, D.I. Kim, Gen-erative AI agents with large language model for satellite networks via a mixture of experts transmission, IEEE J. Sel. Areas Commun. 42 (2024) 3581-3596.

[54]

S. Javaid, H. Fahim, B. He, N. Saeed, Large language models for uavs: current state and pathways to the future, IEEE Open J. Veh. Technol. 5 (2024) 1166-1192.

[55]

P. Jiang, C.-K. Wen, X. Li, S. Jin, G.Y. Li, Semantic satellite communications based on generative foundation model, IEEE J. Sel. Areas Commun. 43 (7) (2025) 2431-2445.

[56]

K. Ito, The lj speech dataset, https://keithito.com/LJ-Speech-Dataset/, 2017. (Ac-cessed 8 May 2025).

[57]

S. Wisayataksin, An efficient hardware architecture of Codec2 low bit-rate speech decoder, in: 2019 5th International Conference on Engineering, Applied Sciences and Technology (ICEAST), IEEE, 2019, pp. 1-4.

[58]

R. Prenger, R. Valle, B. Catanzaro,Waveglow: a flow-based generative network for speech synthesis, in: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 3617-3621.

[59]

G. Krebs, Head 1 (satellogic 1), https://space.skyrocket.de/doc_sdat/head-1.htm, 2024. (Accessed 8 May 2025).

[60]

W. Hsu, et al., HuBERT: self-supervised speech representation learning by masked prediction of hidden units, IEEE Trans. Audio Speech Lang. Process. 29 (2021) 3451-3460.

AI Summary AI Mindmap
PDF

110

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/