Exploring materials data through collaboration: 2024 KRICT ChemDX Hackathon

Su-Hyun Yoo , Andre K. Y. Low , Jose Recatala-Gomez , Harikrishna Sahu , Chiho Kim , Joonyoung F. Joung , Hoje Chun , Katerina A. Christofidou , Joshua Berry , Michail Minotakis , Kisung Kang , Kwang-soo Kim , Gaheun Shin , Hyunwoo Jang , Sanghyuk Lee , Minkyu Park , Byung-Hyun Kim , Kihyun Shin , Jungho Shin , Aloysius Soon , Joshua Schrier , Woosun Jang

Journal of Materials Informatics ›› 2025, Vol. 5 ›› Issue (4) : 54

PDF
Journal of Materials Informatics ›› 2025, Vol. 5 ›› Issue (4) :54 DOI: 10.20517/jmi.2025.65
Perspective

Exploring materials data through collaboration: 2024 KRICT ChemDX Hackathon

Author information +
History +
PDF

Abstract

Data-driven research is in the spotlight across many science and engineering fields, including materials science, with the expectation that effective utilization of data, supported by modern artificial intelligence techniques, can lead to breakthroughs in addressing key scientific questions. Korea Research Institute of Chemical Technology (KRICT) Chemical Data Explorer platform (ChemDX), our web-based and integrated platform, including various data explorer and artificial intelligence modules, aims to enhance accessibility of chemical data for digital materials discovery. In this article, we highlight the results of the 2024 KRICT ChemDX Hackathon, an event to support data-driven research in chemistry and materials science. Hackathon participants explored ChemDX platform and developed projects ranging from machine learning models and data visualization tools to user interface improvements. These projects demonstrated the versatility and potential of data-driven research with the aid of ChemDX platform, in bridging data-driven experimental and computational research. The feedback and outcomes from this hackathon demonstrate the impressive potential of interdisciplinary data-driven research, guide further improvements to the platform, and enhance its usability and outreach.

Keywords

Machine learning / database / hackathon / materials discovery

Cite this article

Download citation ▾
Su-Hyun Yoo, Andre K. Y. Low, Jose Recatala-Gomez, Harikrishna Sahu, Chiho Kim, Joonyoung F. Joung, Hoje Chun, Katerina A. Christofidou, Joshua Berry, Michail Minotakis, Kisung Kang, Kwang-soo Kim, Gaheun Shin, Hyunwoo Jang, Sanghyuk Lee, Minkyu Park, Byung-Hyun Kim, Kihyun Shin, Jungho Shin, Aloysius Soon, Joshua Schrier, Woosun Jang. Exploring materials data through collaboration: 2024 KRICT ChemDX Hackathon. Journal of Materials Informatics, 2025, 5(4): 54 DOI:10.20517/jmi.2025.65

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Nolte A,Herbsleb JD.How to support newcomers in scientific hackathons - an action research study on expert mentoring.Proc ACM Hum Comput Interact2020;4:1-23

[2]

Heller B,Waxman R.Hack your organizational innovation: literature review and integrative model for running hackathons.J Innov Entrep2023;12:6 PMCID:PMC9983543

[3]

NC State. MATDAT18: materials and data science hackathon. https://matdat18.wordpress.ncsu.edu/. (accessed 2025-11-07)

[4]

Sparks TD,Fredrickson DC.Insights and innovations from the SSMCDAT 2023: bridging solid-state materials chemistry and data science.Chem Mater2024;36:5293-6

[5]

University of Latvia. Hackathon 2022. https://www.quantumtheory.lu.lv/events/hackathon-2022/. (accessed 2025-11-07)

[6]

BO Hackathon with Acceleration Consortium. Hackathon agenda. https://ac-bo-hackathon.github.io/agenda/. (accessed 2025-11-07)

[7]

Jablonka KM,Al-Feghali A.14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon.Digit Discov2023;2:1233-50

[8]

Zimmermann Y,Afzal Z. Reflections from the 2024 large language model (LLM) hackathon for applications in materials science and chemistry. arXiv 2024, arXiv:2411.15221. Available online: https://doi.org/10.48550/arXiv.2411.15221. (accessed 7 Nov 2025)

[9]

KRICT Chemical Data Explorer (ChemDX). https://www.chemdx.org/. (accessed 2025-11-07)

[10]

Horton MK,Yang RX.Accelerated data-driven materials science with the Materials Project.Nat Mater2025;24:1522-32

[11]

Scheidgen M,Ladines AN.NOMAD: a distributed web-based platform for managing materials science research data.J Open Source Softw2023;8:5388

[12]

ThermoElectric Materials Explorer (TEXplorer). https://texplorer.org/about. (accessed 2025-11-07)

[13]

Lee YL,Jang S.TEXplorer.org: thermoelectric material properties data platform for experimental and first-principles calculation results.APL Mater2023;11:041111

[14]

Yang JH,Kim HJ.https://2DMat.ChemDX.org: experimental data platform for 2D materials from synthesis to physical properties.Digit Discov2024;3:573-85

[15]

Mok DH.Atomic structure-free representation of active motifs for expedited catalyst discovery.J Chem Inf Model2021;61:4514-20

[16]

Na GS.A public database of thermoelectric materials and system-identified material representation for data-driven discovery.npj Comput Mater2022;8:897

[17]

Jang S,Choi Y.Optical property dataset of inorganic phosphor.Sci Rep2024;14:7639 PMCID:PMC10984968

[18]

Lee YL,Kim T.Data-driven enhancement of ZT in SnSe-based thermoelectric systems.J Am Chem Soc2022;144:13748-63

[19]

Kim JS,Oh J.Closed-loop optimization of catalysts for oxidative propane dehydrogenation with CO2 using artificial intelligence.J CO2 Util2023;78:102620

[20]

Kim HW,Na GS.Reaction condition optimization for non-oxidative conversion of methane using artificial intelligence.React Chem Eng2021;6:235-43

[21]

Park J,Kim J.Catalyst discovery for propane dehydrogenation through interpretable machine learning: leveraging laboratory-scale database and atomic properties.ACS Sustainable Chem Eng2024;12:10376-86

[22]

Yang JH,Kwon H,Chang H.High glass transition temperature fluorinated polymers based on transfer learning with small experimental data.Macromol Rapid Commun2024;45:e2400161

[23]

Kim J,Im J.Machine learning-enabled chemical space exploration of all-inorganic perovskites for photovoltaics.npj Comput Mater2024;10:1270

[24]

LitDX. DB’s visualization function. https://litdx.materials.chemdx.org/. (accessed 2025-11-07)

[25]

Solar Cell. DB’s visualization function. https://solar.chemdx.org/statistics. (accessed 2025-11-07)

[26]

Snyder GJ.Complex thermoelectric materials.Nat Mater2008;7:105-14

[27]

Wang AY,Murdock RJ.Compositionally restricted attention-based network for materials property predictions.npj Comput Mater2021;7:545

[28]

Prein T,Dörr T,Rupp JLM. MTENCODER: a multi-task pretrained transformer encoder for materials representation learning. 2023. https://rgdoi.net/10.13140/RG.2.2.20897.79202. (accessed 7 Nov 2025)

[29]

Batatia I,Chiang Y. A foundation model for atomistic materials chemistry. arXiv 2024, arXiv:2401.00096. Available online: https://doi.org/10.48550/arXiv.2401.00096. (accessed 7 Nov 2025)

[30]

Park Y,Hwang S.Scalable parallel algorithm for graph neural network interatomic potentials in molecular dynamics simulations.J Chem Theory Comput2024;20:4857-68

[31]

ORB forcefield models from Orbital Materials. https://github.com/orbital-materials/orb-models. (accessed 2025-11-07)

[32]

ChemDX - LitDX. https://litdx.materials.chemdx.org. (accessed 2025-11-07)

[33]

Jablonka KM,Ortega-Guerrero A.Leveraging large language models for predictive chemistry.Nat Mach Intell2024;6:161-9

[34]

Xie Z,Omar ÖH,Cooper AI.Fine-tuning GPT-3 for machine learning electronic and functional properties of organic molecules.Chem Sci2024;15:500-10 PMCID:PMC10762956

[35]

Zhong S.Developing quantitative structure–activity relationship (QSAR) models for water contaminants’ activities/properties by fine-tuning GPT-3 models.Environ Sci Technol Lett2023;10:872-7

[36]

Kim S,Schrier J.Large language models for inorganic synthesis predictions.J Am Chem Soc2024;146:19654-9

[37]

Song Z,Ju M,Wang J. Is large language model all you need to predict the synthesizability and precursors of crystal structures? arXiv 2024, arXiv:2407.07016. Available online: https://doi.org/10.48550/arXiv.2407.07016. (accessed 7 Nov 2025)

[38]

Jacobs R,Schultz LE,Honavar V. Regression with large language models for materials and molecular property prediction. arXiv 2024, arXiv:2409.06080. Available online: https://doi.org/10.48550/arXiv.2409.06080. (accessed 7 Nov 2025)

[39]

Rubungo AN,Hattrick-Simpers J.LLM4Mat-bench: benchmarking large language models for materials property prediction.Mach Learn Sci Technol2025;6:020501

[40]

Van Herck J,Jablonka KM.Assessment of fine-tuned large language models for real-world chemistry and material science applications.Chem Sci2025;16:670-84 PMCID:PMC11629507

[41]

Sayeed HM,Sparks TD. Structure feature vectors derived from Robocrystallographer text descriptions of crystal structures using word embeddings. ChemRxiv 2023. Available online: http://dx.doi.org/10.26434/chemrxiv-2023-3q8wj. (accessed 7 Nov 2025)

[42]

Kim S,Jung Y.Explainable synthesizability prediction of inorganic crystal polymorphs using large language models.Angew Chem Int Ed2025;64:e202423950

[43]

Reimers N. Sentence-BERT: sentence embeddings using siamese BERT-networks. arXiv 2019, arXiv:1908.10084. Available online: https://doi.org/10.48550/arXiv.1908.10084. (accessed 7 Nov 2025)

[44]

Grattafiori A,Jauhri A. The Llama 3 herd of models. arXiv 2024, arXiv:2407.21783. Available online: https://doi.org/10.48550/arXiv.2407.21783. (accessed 7 Nov 2025)

[45]

OpenAI Platform. GPT-4o mini. https://platform.openai.com/docs/models#gpt-4o-mini. (accessed 2025-11-07)

[46]

Baird S,Afzal Z. Bayesian optimization hackathon for chemistry and materials. ChemRxiv 2025. Available online: https://doi.org/10.26434/chemrxiv-2025-dzh5z. (accessed 7 Nov 2025)

[47]

Ottomano F,Gusev VV.Not as simple as we thought: a rigorous examination of data aggregation in materials informatics.Digit Discov2024;3:337-46

[48]

Pedregosa F,Gramfort A.Scikit-learn: machine learning in Python.J Mach Learn Res2011;12:2825-30http://jmlr.org/papers/v12/pedregosa11a.html. (accessed 7 Nov 2025)

[49]

Hocky GM.Natural language processing models that automate programming will transform chemistry research and teaching.Digit Discov2022;1:79-83 PMCID:PMC8996826

[50]

White AD,Gandhi HA.Assessment of chemistry knowledge in large language models that generate code.Digit Discov2023;2:368-76

[51]

Microsoft Research AI4Science, Microsoft Azure Quantum. The impact of large language models on scientific discovery: a preliminary study using GPT-4. arXiv 2023, arXiv:2311.07361. Available online: https://doi.org/10.48550/arXiv.2311.07361. (accessed 7 Nov 2025)

[52]

Hare PM.Coding with AI in the Physical Chemistry Laboratory.J Chem Educ2024;101:3869-74

[53]

Coudert F.Reproducible research in computational chemistry of materials.Chem Mater2017;29:2615-7

[54]

Persaud D,Hattrick-Simpers J.Reproducibility in materials informatics: lessons from ‘A general-purpose machine learning framework for predicting properties of inorganic materials’.Digit Discov2024;3:281-6

[55]

Butler KT,Csanyi G,Kalinin SV.Setting standards for data driven materials science.npj Comput Mater2024;10:1411

[56]

McDowell DL.Gaps and barriers to successful integration and adoption of practical materials informatics tools and workflows.JOM2021;73:138-48

[57]

The Minerals, Metals & Materials Society. MGI workforce. https://www.tms.org/MGIworkforce. (accessed 2025-11-07)

[58]

Wang Z,Tao K.AlphaMat: a material informatics hub connecting data, features, models and applications.npj Comput Mater2023;9:1086

[59]

Wang Z,Tao K,Li J.MatGPT: a vane of materials informatics from past, present, to future.Adv Mater2024;36:e2306733

AI Summary AI Mindmap
PDF

147

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/