1 Introduction
Computer simulation changes the traditional paradigm of painstaking “try and error” experiments and greatly accelerates the development of molecules and materials, which, however, usually relies on high-throughput
ab initio calculations with considerable computational cost. Machine learning techniques, especially deep learning (DL), have been introduced to further improve the efficiency and applicability of computational chemistry and computational materials sciences [
1-
3]. Inspired by the remarkable progress in computer vision and natural language processing, DL has been successfully used for transition state searching [
4], chemical reactivity prediction [
5,
6], thermal properties prediction [
7], catalyst design [
8], spectroscopic prediction [
9], solution of the many-electron Schrödinger equation [
10], magnetic field estimation [
11], molecular odor prediction [
12], and so on.
Among various DL networks, graph neural network [
13], which has been widely used in social network, internet of things, knowledge graph, and recommender system, is particularly suitable for clusters and molecules. Compared to the commonly used artificial neural network (ANN) [
14-
21] that treats each atom independently, graph network is a natural choice to describe the geometric characteristic of a cluster or molecule, since it contains the information of interconnection between atoms [
22-
32]. Moreover, graph network is less sensitive to the choice of atomic descriptors. To date, several graph neural networks, namely, SchNet [
23], CGCNN [
26], MEGNet [
28], MGCN [
29], DimNet [
30], DGANN [
31] and DeepMolNet [
32] have been developed for molecules and crystals. For instance, SchNet [
23] introduces the continuous-filter convolutions (cfconv) and utilizes element type and atomic coordinates as input. DimNet [
30] uses spherical Bessel functions and spherical harmonics to replace the Gaussian radial basis representations. CGCNN [
26] allows multiple edges in crystal graph due to periodicity which can predict eight different properties of crystals. DGANN [
31] does not require the complete graph and directly uses the Cartesian coordinates instead of the interatomic distance as input.
Compared to molecules with well-defined geometry, the structure of atomic cluster evolves with size (i.e., number of atoms) and searching the ground-state structure requires global exploration on the potential energy surface (PES). Due to the coexistence of numerous isomers on the PES, DL techniques, including graph network, have been rarely applied to cluster science yet. For a cluster, not only the tiny difference between similar isomers would confuse the neural network, but also the flexible bonding configuration is disadvantageous to build network.
Different from many other fields that need node information only, application of graph network on molecules, clusters, or crystals must consider information of both atoms and bonds. As one of the three leading authorities in deep learning, attention mechanism is proposed for machine translation by Bengio [
33]. The effect of neighboring bonds on a center atom is suited to model by attention mechanism. Qian
et al. [
31] devised a directed graph attention neural network (DGANN) that encodes the local chemical environment by graph attention mechanism. The DeepMoleNet [
32] developed by Ma’s group adopts multilevel attention that can precisely predict 12 molecular properties integrating in one net. However, graph network associated with attention mechanism has not been adopted for structural optimization of clusters yet.
In this paper, we develop a cluster graph attention network (CGANet) to reproduce the binding energy, force, and molecular orbital energies of silver cluster from density functional theory (DFT) calculations. Combining with a home-made genetic algorithm code, we have performed global search for the ground state structures of Agn clusters with n = 14−26 and unveiled several unprecedented lowest-energy structures. This graph attention network, which is at least two orders of magnitude faster than DFT calculations, is a universal approach for structural optimization and property prediction of atomic clusters.
2 Network structure
2.1 Feature extraction
For an atomic cluster or molecule, element type, bond length and bond angle are the essential information that determines the physical and chemical properties of the system. We first extract this information from the element type and distance matrix as input. The element information should be transformed to a vector by one-hot encoding or embedding layer. For simplicity, herein we consider elemental clusters of silver atoms; thus, we do not have to distinguish the element type. The initial feature of atoms is a one-dimensional vector of constant 1. The Cartesian coordinates of atoms are not directly utilized as the input of network. Instead, we extract the structural information from Cartesian coordinates and put it into the network. The key operation of CGANet is schematically shown in Fig.1. We start from the distance matrix
D because it is invariant under translation or rotation. For each pair of atom
i and
j, the distance
dij in
D matrix is transformed to a 36-dimensional vector
eij0 by a Gaussian function [
14]:
Here k and k are parameters to be trained. The bond angle ijk can be simply calculated from dij, dik and djk using the cosine law. Then we introduce a 24-dimensional angle vector aijk0 in a similar way:
where l and l are parameters to be trained.
2.2 Vertex convolutional layer
The classical graph convolutional operation is expressed as follows [
34]:
where
Xl is the output of layer
l,
A is the adjacent matrix of graph,
W is the weight matrix of this layer with size
nl ×
nl+1. The product of
XlW transforms the feature from dimension
nl to dimension
nl+1.
Xl left multiplied by
A would sum all neighbor feature to the center atom. At last, an activation function
is used to achieve the output of layer
l+1. However, this graph convolutional operation has two deficiencies. First, different neighboring atoms have different effect on the center atom; therefore, a simple summation of them is unreasonable. The second issue is how to properly aggregate the feature of center atom itself. Hence, we introduce an improved graph attention layer. Let
Vl be the matrix of each atom’s feature of layer
l, and
h=
VlW be a new feature matrix of another dimension. Then, we aggregate neighbor features with different weights, and the weights are calculated by the center feature
hi, the neighbor feature
hj and the vector
dij [
35]
Here Softmax and LeakyReLU are activation functions, ij is attention coefficient, is a vector to be trained. The Softmax function can normalize the attention coefficient. For each center atom i, the feature of neighboring atom j is aggregated by the following formula:
The aggregation includes two parts—the feature of atom i and the summation of features of all neighboring atoms j scaled with an attention coefficient ij weighted by coefficient . At last, the aggregation goes through an activation function Elu to yield the next output feature Vl+1. Attention mechanism is permutation invariant because the attention coefficient ij permutes with atom j.
2.3 Edge convolutional layer
The l+1th layer edge feature is obtained from vertex features of the two ends and the edge feature itself in the previous lth layer:
2.4 Angle convolutional layer
Similar to the vertex convolutional layer, the new feature of edge eij is aggregated by the edge eij itself and all angle features aijk using attentional mechanism:
2.5 Force convolutional layer
We combine the feature vectors vi, vj and eij, and map them into a single value as combination coefficient ij. Then, the force of atom i can be related to the linear combination of vector rij from atom i to atom j.
Here rij′ is the unit vector of rij.
2.6 Energy prediction network
As shown in Fig.2(a), the energy prediction network takes distance matrix D as input. We use an edge convolutional layer to integrate the element information (embedding layer) and the distance feature extracted by distance matrix D. Then we integrate the edge information and angle information to an angle convolutional layer. Next, the angle convolutional layer goes through a batch normalization and combines with the element information to enter a vertex convolutional layer. The resulting vertex information flows through a batch normalization layer and combines with the previous edge information to yield a new edge convolutional layer. In the dense layer, the obtained vertex feature and edge feature are mapped to a single number of Ev and Ee, respectively. In the final global average pool layer as the readout function, the average binding energy per atom is predicted by
In fact, the binding energy of cluster usually increases and gradually approaches to the bulk limit as the size N increase, whose trend is similar to the curve of 1+c1/N+c2/N2. Here we introduce four empirical parameters c1, c2, c3 and c4 to reflect the overall trend of size dependence of binding energy, and N is the cluster size. By definition, a positive binding energy means formation of a cluster from individual atoms is exothermic. The parameters c1−c4 are all negative, which successfully reflect the ascending tendency of binding energy as cluster size increases. Our test calculations show that inclusion of parameters for the size dependence of binding energy is beneficial for mixed-size train.
2.7 Force prediction network
In a previous study, an energy-conserving force model was obtained by differentiating the energy model with respect to the atomic position [
23]. Due to the relation of energy and force, the model has to be at least twice differentiable to allow for gradient descent of the force loss, the resulting force model is twice as deep and, hence, requires about twice amount of computational time. Therefore, the energy and force networks are trained separately in this work, which can avoid the consumption of differential operation. The force prediction network is depicted in Fig.2(b). Generally, it resembles that of energy prediction but with little difference. Both networks share the same architecture until the vertex convolutional layer. Then we replace the last edge convolutional layer by a second vertex convolutional layer. Followed by a batch normalization layer, the force convolutional layer is connected as output. Moreover, similar CGANet architectures for energy, force and properties prediction with good performance can fully demonstrate the robustness of CGANet.
3 Train and test
3.1 Datasets
To demonstrate how to implement the CGANet, we consider medium-sized Ag
n clusters (
n = 15−27) to construct datasets. First, huge number of structures for Ag
16, Ag
20 and Ag
24 clusters are obtained from unbiased search on the PES of clusters using a home-developed genetic algorithm program (namely CGA [
36]). For each cluster, we conduct five independent CGA searches up to 1000 iterations with symmetry constrain of C
1, C
2, C
3, C
s and C
i point groups, respectively. All structures in CGA search are fully optimized with DFT, as implemented by the DMol
3 program [
37], which utilizes the double numerical basis including d-polarization function (DND) with a global orbital cutoff of 6.0 Å (see Fig. S1 of the Supporting Information), the Perdew−Burke−Ernzerhof (PBE) functional for exchange-correlation interaction [
38], and all-electron relativistic potential to treat the core. All intermediate structures during geometry optimization are collected to constitute the train set, unless the binding energy of the structure is negative (which means that the structure is too unstable). Finally, the total size of dataset for Ag
16, Ag
20 and Ag
24 is 166 689, 136 165 and 145 014, respectively. We have also collected the reported isomeric structures of Ag
n clusters (
n = 15, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27) in literature [
21,
39-
45] and re-optimized them using the same PBE-DND scheme. Again, using the intermediate structures during geometry optimization, the resulting dataset contains 1825 cluster structures as the test dataset for mixed-size train.
3.2 Train
First, we train the CGANet on datasets of Ag16, Ag20 and Ag24 individually. For each cluster size, the data are split to train data and test data by 9:1. The loss function of force prediction net is mean absolute error (MAE) of force, while the loss function of energy prediction net is root mean square error (RMSE) of energy since RMSE is larger than MAE by definition and more sensitive to reflect the deviation between the data predicted by neural network and from DFT calculations. For comparison, we have trained different networks for energy prediction on dataset of Ag20. The ANN with two full connection layers after bond and angle feature extract layer has RMSE larger than 20 meV, while the graph convolutional network (GCN) also leads to test error larger than 20 meV. When we replace graph convolutional layer by graph attention layer, RMSE immediately decreases to 13 meV. Then, we improve the graph attention network by a series of testing. We introduce edge convolutional layer and angle convolutional layer to handle the original bond and angle information. Batch normalization layers are used to enhance the generalization ability and accelerate the training process. After testing various activation functions, the values of attention are gained by LeakyReLU function, the outputs of attention layer go through Elu function, and the activation function after full connection layer is chosen to be LeakyReLU. The numbers of initial bond and angle features are 36 and 24 respectively. The numbers of feature of first bond convolutional layer, angle convolutional layer, vertex convolutional layer and second bond convolutional layer are 18, 26, 30 and 32 respectively. It should be noticed that the number of these parameters is not the larger the better. At last, the test error for energy prediction net decreases to 6.8 meV and the architecture of CGANet is fixed. Using the same CGANet framework, the test losses of energy for Ag16 and Ag24 also approach about 6 meV/atom [Fig.3(a)] after training of 100−150 epochs. Similarly, the network of force prediction is constructed and the test losses of force are 42.3, 50.1 and 45.2 meV/Å for n = 16, 20 and 24, respectively [Fig.3(b)].
It is crucial for a net to be transferable to different sizes. Therefore, we combine the datasets of Ag
16, Ag
20 and Ag
24 together as the train set and use the dataset of other sized Ag
n clusters (
n = 15−27, excluding 16, 20, 24) as the test set for a mixed-size train. In this case, the test error of energy is obtained as 11.7 meV/atom (Fig.4), which is larger than those from single-size trains, but still lower than those of ANI (26.0 meV) [
16] and SchNet (23.8 meV) [
23]. Differently, the test loss of force in the mixed-size train does not increase compared to single-size trains, but slightly reduces to 41.7 meV/Å (Fig.4). This value is comparable to the reported RMSE values of 39 meV/Å in Cu−Zn binary cluster by HDNNP [
46] and 43.4 meV/Å in organic molecules by GDML [
47]. Overall speaking, the test errors of energy and force from mixed-size train are satisfactory and the trained CGANet should be able to describe the PES of Ag
n clusters of different sizes.
3.3 Structural prediction
Using the CGANet from mixed-sized train, we have performed global optimization of Agn clusters in size range of n = 14−26 within the framework of genetic algorithm, as implemented by our CGA code. Briefly speaking, 128 configurations are generated from scratch as the initial population for each sized Agn cluster. Then, any two individuals in this population are chosen as parents to produce a child cluster via a ‘‘cut and splice’’ crossover operation, followed by an optional mutation operation of 35% probability. Each child cluster from CGA iteration is locally relaxed by a combined technique of quick molecular dynamic simulations of 20 steps at 300 K and numerical minimization with BFGS algorithm. Our test calculations demonstrate that this technique is rather efficient to avoid being trapped in shallow local minima on the PES. For each cluster size, the independent CGA search based on CGANet lasts for about 1000 iterations.
The global minima structures of Ag
n (
n = 14−26) are given in Fig.5. To double check our finding, we have re-performed structural relaxation using Gaussian16 suite [
48], with a metaGGA TPSS exchange-correlation functional [
49] and the SDD basis set. As demonstrated by our test calculations on spin multiplicity (Supporting Information, Fig. S2), we choose the possibly lowest spin states for all Ag
n clusters, i.e., singlet for even-sized clusters and doublet for odd-sized ones. As presented in Table S1 of the Supporting Information, our test calculations demonstrate that the TPSS/SDD scheme is able to describe the binding energy, bond length, ionization potential, and vibrational frequency of Ag
2 dimer well. First of all, we have reproduced the reported ground-state structures of silver clusters at
n = 14−21 from previous DFT studies [
23,
41-
47]. The ground state structure of Ag
14 is prolate with C
2 symmetry, in agreement with previous works [
39,
41,
43]. The ground state structure of Ag
15 is similar to Ag
14 but with D
3 symmetry, and it can be viewed as the same structure reported by Tian
et al. [
39]. We find that the lowest-energy configuration of Ag
16 also takes prolate shape with D
2 symmetry. This structure is the same as Baishya’s result [
41] but different from the other reports [
39,
40,
42-
44]. The lowest-energy configuration of Ag
17 is identical to that reported by Baishya
et al. [
41], which can be obtained by adding 4 atoms on the waist of Ag
13 icosahedron. The present lowest-energy structure Ag
18 takes a distort star shape with C
s symmetry, and it has been reported by Baishya and McKee [
44]. For Ag
19, We find a C
3 structure that coincides with Yin and Du’s structure for Ag
19− anion [
45]. It is 0.13 eV and 0.11 eV lower than that reported by Chen [
42] and Baishya [
41], respectively, using DMol
3 calculation. However, at level of TPSS/SDD, the structure reported by Chen [
42] is more energetically favorable. Similar to Ag
19, our lowest-energy structure of Ag
20 is a twist tetrahedrons with C
3 symmetry, which has been reported by Du [
45] and McKee [
44] respectively. The ground state structure of Ag
21 found by us also takes twist tetrahedrons configuration and it can be considered as the same one in previous studies [
39,
45].
More excitingly, unprecedented lowest-energy structures of Ag
n clusters at
n = 22−26 are obtained from our CGA search with CGANet. For
n = 22−26, the lowest-energy structures from present work and those reported in recent studies [
21,
39-
45] are compared in Fig. S3 of the Supporting Information. The newly found structures energetically prevail those in literature by 43−960 meV and 70−702 meV at PBE/DND and TPSS/SDD level of theory, respectively. The ground state structure of Ag
22 found here has C
3 symmetry, which is different with all previous structures but similar with Ag
23 structure reported by Chen [
42]. It is also 0.35 eV lower than McKee’s structure [
44] at TPSS/SDD level. All the lowest-energy structures of Ag
23−25 find by us are 3-layer frustum with bottom face of 14 Ag atoms. We compare them with McKee’s structures [
44] using TPSS/SDD calculations, and all the present structures are 0.2−1.2 eV lower in energy. Our lowest-energy structure of Ag
26 adopts the configuration of a triangular bi-pyramid by removing 4 atoms at both two vertices and waist. This structure is 0.43 eV lower that the structure reported by Chen [
42]. Based on the lowest-energy structures of Ag
n (
n = 14−26), the vertical ionization potentials are calculated and with experimental data [
50] in Fig. S4 of the Supporting Information. The satisfactory agreement demonstrates the validity of our theoretical scheme.
Compared to the conventional DFT calculations using DMol3, CGANet is faster by at least two orders of magnitude but with comparable accuracy (11.7 meV/atom RMSE in energy). Combined with genetic algorithm or other global minimization approach, it can provide robust description of the PES and allows more efficient search of the ground state structures. Our further investigation on large Agn clusters up to n = 60 using CGANet is still underway and the results will be reported elsewhere.
3.4 Properties prediction
In addition to energy and force, it is desirable to predict the physical or chemical properties of a cluster using the developed graph neural network. As key electronic properties, here we consider the gap between highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) as well as the HOMO energy (whose negative value is approximately the vertical ionization energy according to the Koopman’s theorem [
51]). For either HOMO−LUMO gap or HOMO energy, the CGANet is the same as that for energy prediction described before, just by replacing the object function from energy to the desired property. Taking Ag
20 as a representative, we train HOMO energy and HOMO−LUMO gap individually and the results are presented in Fig.6. Most intermediate structures during geometry optimization are kept as the dataset, while about 18% of them with unreasonable binding energy are discarded. After 90 epochs train, the MAE of HOMO−LUMO gap for test set reaches 45.8 meV (Fig.6), while the range of HOMO−LUMO gap values lies between 0.002 to 1.703 eV. For comparison, the MAE for HOMO−LUMO gap on QM9 dataset is 32.1 meV of DeepMoleNet and 34.8 meV of DimNet, respectively, and those for other models range between 56 to 69 meV. On the other hand, MAE of HOMO energy for test set is 50.0 meV, which is only about 1% of the absolute HOMO value (−4.217 eV in average). It is worthy to mention that the present CGANet is not limited to orbital energy, but can be extended to describe other properties such as polarizability, spectral characteristics, NMR chemical shift, magnetic moment, magnetic anisotropic energy, adsorption energy and chemical reactivity. The capability of directly predicting the chemical or physical property from atomic structure is essential for rapid screening of functional clusters or molecules.
4 Conclusion
To summarize, a graph network for atomic clusters, namely CGANet, is built by aggregation layers of vertex, edge, bond angle and force information using attention mechanism. It takes only element type and coordinates as input and is able to predict the binding energy and force with satisfactory precision. Combining a genetic algorithm code for global search, CGANet can reproduce the DFT potential energy surface of Agn clusters with hundredfold acceleration. Despite our train set only includes data at selected sizes of n = 16, 20 and 24, it is successful for an extended size range of n = 14−26. Especially, a number of unprecedented lowest-energy structures have been obtained for Agn clusters at n = 22−26. Furthermore, CGANet can predict HOMO energy and HOMO−LUMO gap with MAE of about 50 meV, showing its capability of describing other physical and chemical properties. Although the specific way of constructing the aggregation layers of vertex, edge, bond angle and generating the dataset may vary with the system, the idea of using current graph attention network is universal and extendable to any atomic clusters and perhaps other materials. We anticipate that the developed CGANet can be widely used for rapidly searching the ground state structures of large clusters and reversal design of functional clusters with desired properties.