Diversity from human feedback

Ren-Jian WANG , Ke XUE , Yu-Tong WANG , Peng YANG , Hao-Bo FU , Qiang FU , Chao QIAN

Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (2) : 2002320

PDF (3008KB)
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (2) : 2002320 DOI: 10.1007/s11704-025-41167-w
Artificial Intelligence
RESEARCH ARTICLE

Diversity from human feedback

Author information +
History +
PDF (3008KB)

Abstract

Diversity plays a significant role in many problems, such as ensemble learning, reinforcement learning, and combinatorial optimization. How to define the diversity measure is a longstanding problem. Many methods rely on expert experience to define a proper behavior space and then obtain the diversity measure, which is, however, challenging in many scenarios. In this paper, we propose the problem of learning a behavior space from human feedback and present a general method called Diversity from Human Feedback (DivHF) to solve it. DivHF learns a behavior descriptor consistent with human preference by querying human feedback. The learned behavior descriptor can be combined with any distance measure to define a diversity measure. We demonstrate the effectiveness of DivHF by integrating it with the Quality-Diversity optimization algorithm MAP-Elites and conducting experiments on the QDax suite. The results show that the behavior learned by DivHF is much more consistent with human requirements than the one learned by direct data-driven approaches without human feedback, and makes the final solutions more diverse under human preference. Our contributions include formulating the problem, proposing the DivHF method, and demonstrating its effectiveness through experiments.

Graphical abstract

Keywords

quality diversity / human feedback / behavior descriptor / diversity measure

Cite this article

Download citation ▾
Ren-Jian WANG, Ke XUE, Yu-Tong WANG, Peng YANG, Hao-Bo FU, Qiang FU, Chao QIAN. Diversity from human feedback. Front. Comput. Sci., 2026, 20(2): 2002320 DOI:10.1007/s11704-025-41167-w

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Lehman J, Stanley K O. Evolving a diversity of virtual creatures through novelty search and local competition. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation. 2011, 211−218

[2]

Conti E, Madhavan V, Such F P, Lehman J, Stanley K O, Clune J. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 5032−5043

[3]

Eysenbach B, Gupta A, Ibarz J, Levine S. Diversity is all you need: learning skills without a reward function. In: Proceedings of the 7th International Conference on Learning Representations. 2019

[4]

Parker-Holder J, Pacchiano A, Choromanski K, Roberts S. Effective diversity in population based reinforcement learning. In: Proceedings of the 34th International Conference on Neural Information Processing System. 2020, 1515

[5]

Chalumeau F, Boige R, Lim B, Macé V, Allard M, Flajolet A, Cully A, Pierrot T. Neuroevolution is a competitive alternative to reinforcement learning for skill discovery. In: Proceedings of the 11th International Conference on Learning Representations. 2023

[6]

Wu S, Yao J, Fu H, Tian Y, Qian C, Yang Y, Fu Q, Wei Y. Quality-similar diversity via population based reinforcement learning. In: Proceedings of the 11th International Conference on Learning Representations. 2023

[7]

Yao J, Liu W, Fu H, Yang Y, McAleer S, Fu Q, Yang W. Policy space diversity for non-transitive games. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 2964

[8]

Brown G, Wyatt J, Harris R, Yao X . Diversity creation methods: a survey and categorisation. Information Fusion, 2005, 6( 1): 5–20

[9]

Zhou Z-H. Ensemble Methods: Foundations and Algorithms. New York: CRC Press, 2012

[10]

Gomes H M, Barddal J P, Enembreck F, Bifet A . A survey on ensemble learning for data stream classification. ACM Computing Surveys (CSUR), 2017, 50( 2): 23

[11]

An G, Moon S, Kim J-H, Song H O. Uncertainty-based offline reinforcement learning with diversified Q-ensemble. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 569

[12]

He Y-X, Wu Y-C, Qian C, Zhou Z-H . Margin distribution and structural diversity guided ensemble pruning. Machine Learning, 2024, 113( 6): 3545–3567

[13]

Fontaine M C, Nikolaidis S. Differentiable quality diversity. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 768

[14]

Tao H, Long C, Xiao C . CRD-CGAN: category-consistent and relativistic constraints for diverse text-to-image generation. Frontiers of Computer Science, 2024, 18( 1): 181304

[15]

Samvelyan M, Raparthy S C, Lupu A, Hambro E, Markosyan A H, Bhatt M, Mao Y, Jiang M, Parker-Holder J, Foerster J, Rocktäschel T, Raileanu R. Rainbow teaming: open-ended generation of diverse adversarial prompts. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024

[16]

Do A, Guo M, Neumann A, Neumann F . Analysis of evolutionary diversity optimization for permutation problems. ACM Transactions on Evolutionary Learning, 2022, 2( 3): 11

[17]

Nikfarjam A, Moosavi A, Neumann A, Neumann F. Computing high-quality solutions for the patient admission scheduling problem using evolutionary diversity optimisation. In: Proceedings of the 17th International Conference on Parallel Problem Solving from Nature. 2022, 250−264

[18]

Maley C C. Four steps toward open-ended evolution. In: Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation. 1999, 1336−1343

[19]

Standish R K . Open-ended artificial evolution. International Journal of Computational Intelligence and Applications, 2003, 3( 2): 167–175

[20]

Liu X, Jia H, Wen Y, Hu Y, Chen Y, Fan C, Hu Z, Yang Y. Towards unifying behavioral and response diversity for open-ended learning in zero-sum games. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 73

[21]

Jiang M, Grefenstette E, Rocktäschel T. Prioritized level replay. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 4940−4950

[22]

Jiang M, Dennis M, Parker-Holder J, Foerster J, Grefenstette E, Rocktäschel T. Replay-guided adversarial environment design. In: Proceedings of the 35th International Conference on Neural Information Processing System. 2021, 145

[23]

Fontaine M C, Liu R, Khalifa A, Modi J, Togelius J, Hoover A K, Nikolaidis S. Illuminating Mario scenes in the latent space of a generative adversarial network. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 5922−5930

[24]

Bhatt V, Tjanaka B, Fontaine M C, Nikolaidis S. Deep surrogate assisted generation of environments. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 2737

[25]

Parker-Holder J, Jiang M, Dennis M, Samvelyan M, Foerster J, Grefenstette E, Rocktäschel T. Evolving curricula with regret-based environment design. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 17473−17498

[26]

Zhang Y, Fontaine M C, Bhatt V, Nikolaidis S, Li J. Multi-robot coordination and layout design for automated warehousing. In: Proceedings of the 32nd International Joint Conference on Artificial Intelligence. 2023, 5503−5511

[27]

Sharkey A J C. Combining Artificial Neural Nets. Springer, 1999

[28]

Sheikh H, Frisbee K, Phielipp M. DNS: determinantal point process based neural network sampler for ensemble reinforcement learning. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 19731−19746

[29]

Zhou Z-H, Yu Y, Qian C. Evolutionary Learning: Advances in Theories and Algorithms. Singapore: Springer, 2019

[30]

Yang P, Zhang L, Liu H, Li G . Reducing idleness in financial cloud services via multi-objective evolutionary reinforcement learning based load balancer. Science China Information Sciences, 2024, 67( 2): 120102

[31]

Xue K, Wang R-J, Li P, Li D, Hao J, Qian C. Sample-efficient quality-diversity by cooperative coevolution. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[32]

Wang R-J, Xue K, Guan C, Qian C. Quality-diversity with limited resources. In: Proceedings of the 41st International Conference on Machine Learning. 2024

[33]

Li P, Hao J, Tang H, Fu X, Zhen Y, Tang K. Bridging evolutionary algorithms and reinforcement learning: a comprehensive survey on hybrid algorithms. IEEE Transactions on Evolutionary Computation, 2024, DOI: 10.1109/TEVC.2024.3443913

[34]

Pugh J K, Soros L B, Stanley K O . Quality diversity: a new frontier for evolutionary computation. Frontiers in Robotics and AI, 2016, 3: 40

[35]

Fu W, Du W, Li J, Chen S, Zhang J, Wu Y. Iteratively learn diverse strategies with state distance information. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 991

[36]

Kistemaker S, Whiteson S. Critical factors in the performance of novelty search. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation. 2011, 965−972

[37]

Grillotti L, Cully A . Unsupervised behavior discovery with quality-diversity optimization. IEEE Transactions on Evolutionary Computation, 2022, 26( 6): 1539–1552

[38]

Fontaine M C, Togelius J, Nikolaidis S, Hoover A K. Covariance matrix adaptation for the rapid illumination of behavior space. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 2020, 94−102

[39]

Mouret J B, Clune J. Illuminating search spaces by mapping elites. 2015, arXiv preprint arXiv: 1504.04909

[40]

Nilsson O, Cully A. Policy gradient assisted MAP-Elites. In: Proceedings of the Genetic and Evolutionary Computation Conference. 2021, 866−875

[41]

Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets A S, , . StarCraft II: a new challenge for reinforcement learning. 2017, arXiv preprint arXiv: 1708.04782

[42]

OpenAI, Berner C, Brockman G, Chan B, Cheung V, , . Dota 2 with large scale deep reinforcement learning. 2019, arXiv preprint arXiv: 1912.06680

[43]

Cully A, Clune J, Tarapore D, Mouret J-B . Robots that can adapt like animals. Nature, 2015, 521( 7553): 503–507

[44]

Lim B, Allard M, Grillotti L, Cully A. Accelerated quality-diversity through massive parallelism. Transactions on Machine Learning Research, 2023, 2023

[45]

Chalumeau F, Lim B, Boige R, Allard M, Grillotti L, Flageat M, Macé V, Richard G, Flajolet A, Pierrot T, Cully A . QDax: a library for quality-diversity and population-based algorithms with hardware acceleration. The Journal of Machine Learning Research, 2024, 25( 1): 108

[46]

Wang R-J, Xue K, Wang Y, Yang P, Fu H, Fu Q, Qian C. Diversity from human feedback. 2023, arXiv preprint arXiv: 2310.06648v1

[47]

Ding L, Zhang J, Clune J, Spector L, Lehman J. Quality diversity through human feedback. 2023, arXiv preprint arXiv:2310.12103v1

[48]

Ding L, Zhang J, Clune J, Spector L, Lehman J. Quality diversity through human feedback: towards open-ended diversity-driven optimization. In: Proceedings of the 41st International Conference on Machine Learning. 2024

[49]

Chatzilygeroudis K, Cully A, Vassiliades V, Mouret J-B. Quality-diversity optimization: a novel branch of stochastic optimization. In: Pardalos P M, Rasskazova V, Vrahatis M N, eds. Black Box Optimization, Machine Learning, and No-Free Lunch Theorems. Cham: Springer, 2021, 109−135

[50]

Cully A, Demiris Y . Quality and diversity optimization: a unifying modular framework. IEEE Transactions on Evolutionary Computation, 2018, 22( 2): 245–259

[51]

Wang Y, Xue K, Qian C. Evolutionary diversity optimization with clustering-based selection for reinforcement learning. In: Proceedings of the 10th International Conference on Learning Representations. 2022

[52]

Wang R-J, Xue K, Shang H, Qian C, Fu H, Fu Q. Multi-objective optimization-based selection for quality-diversity by non-surrounded-dominated sorting. In: Proceedings of the 32nd International Joint Conference on Artificial Intelligence. 2023, 4335−4343

[53]

Colas C, Madhavan V, Huizinga J, Clune J. Scaling MAP-Elites to deep neuroevolution. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 2020, 67−75

[54]

Pierrot T, Macé V, Chalumeau F, Flajolet A, Cideron G, Beguir K, Cully A, Sigaud O, Perrin-Gilbert N. Diversity policy gradient for sample efficient quality-diversity optimization. In: Proceedings of the Genetic and Evolutionary Computation Conference. 2022, 1075−1083

[55]

Qian C, Xue K, Wang R-J. Quality-diversity algorithms can provably be helpful for optimization. In: Proceedings of the 33rd International Joint Conference on Artificial Intelligence. 2024, 6994−7002

[56]

Cully A. Autonomous skill discovery with quality-diversity and unsupervised descriptors. In: Proceedings of the Genetic and Evolutionary Computation Conference. 2019, 81−89

[57]

Grillotti L, Cully A. Relevance-guided unsupervised discovery of abilities with quality-diversity algorithms. In: Proceedings of the Genetic and Evolutionary Computation Conference. 2022, 77−85

[58]

Sharma A, Gu S, Levine S, Kumar V, Hausman K. Dynamics-aware unsupervised discovery of skills. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[59]

Alvarez A, Dahlskog S, Font J, Togelius J. Empowering quality diversity in dungeon design with interactive constrained MAP-Elites. In: Proceedings of 2019 IEEE Conference on Games. 2019, 1−8

[60]

Alvarez A, Dahlskog S, Font J, Togelius J . Interactive constrained MAP-Elites: analysis and evaluation of the expressiveness of the feature dimensions. IEEE Transactions on Games, 2022, 14( 2): 202–211

[61]

Wirth C, Akrour R, Neumann G, Fürnkranz J . A survey of preference-based reinforcement learning methods. The Journal of Machine Learning Research, 2017, 18( 136): 4945–4990

[62]

Christiano P F, Leike J, Brown T B, Martic M, Legg S, Amodei D. Deep reinforcement learning from human preferences. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 4299−4307

[63]

Casper S, Davies X, Shi C, Gilbert T K, Scheurer J, Rando J, Freedman R, Korbak T, Lindner D, Freire P, Wang T, Marks S, Ségerie C-R, Carroll M, Peng A, Christoffersen P J K, Damani M, Slocum S, Anwar U, Siththaranjan A, Nadeau M, Michaud E J, Pfau J, Krasheninnikov D, Chen X, Langosco L, Hase P, Biyik E, Dragan A D, Krueger D, Sadigh D, Hadfield-Menell D. Open problems and fundamental limitations of reinforcement learning from human feedback. Transactions on Machine Learning Research, 2023, 2023

[64]

Zhang M, Vikram S, Smith L M, Abbeel P, Johnson M J, Levine S. SOLAR: deep structured representations for model-based reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 7444−7453

[65]

Stiennon N, Ouyang L, Wu J, Ziegler D M, Lowe R, Voss C, Radford A, Amodei D, Christiano P. Learning to summarize from human feedback. In: Proceedings of the 34th International Conference on Neural Information Processing System. 2020, 253

[66]

Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C L, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J, Hilton J, Kelton F, Miller L, Simens M, Askell A, Welinder P, Christiano P, Leike J, Lowe R. Training language models to follow instructions with human feedback. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 2011

[67]

Open-AI, Achiam J, Adler S, Agarwal S, Ahmad L, , . GPT-4 technical report. 2023, arXiv preprint arXiv: 2303.08774

[68]

Gemini Team Google. Gemini: A family of highly capable multimodal models. 2023, arXiv preprint arXiv:2312.11805

[69]

Hussonnois M, Karimpanal T G, Rana S. Controlled diversity with preference: towards learning a diverse set of desired skills. In: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. 2023, 1135−1143

[70]

Hochreiter S, Schmidhuber J . Long short-term memory. Neural Computation, 1997, 9( 8): 1735–1780

[71]

Dyer C, Ballesteros M, Ling W, Matthews A, Smith N A. Transition-based dependency parsing with stack long short-term memory. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015, 334−343

[72]

Bradley R A, Terry M E . Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika, 1952, 39( 3-4): 324–345

[73]

Ibarz B, Leike J, Pohlen T, Irving G, Legg S, Amodei D. Reward learning from human preferences and demonstrations in Atari. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018

[74]

Lee K, Smith L M, Abbeel P. PEBBLE: feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 6152−6163

[75]

van den Oord A, Li Y, Vinyals O. Representation learning with contrastive predictive coding. 2018, arXiv preprint arXiv: 1807.03748

[76]

Badia A P, Sprechmann P, Vitvitskyi A, Guo Z D, Piot B, Kapturowski S, Tieleman O, Arjovsky M, Pritzel A, Bolt A, Blundell C. Never give up: learning directed exploration strategies. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[77]

Badia A P, Piot B, Kapturowski S, Sprechmann P, Vitvitskyi A, Guo Z D, Blundell C. Agent57: outperforming the Atari human benchmark. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 507−517

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (3008KB)

171

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/