Speaker
Description
Machine learning (ML) models are increasingly used in combination with electronic structure calculations to predict molecular properties at a much lower computational cost in high-throughput settings. Such ML models require representations that encode the molecular structure, which are generally designed to respect the symmetries and invariances of the target property. However, size-extensivity is usually not guaranteed for so-called global representations. In this contribution, we show how extensivity can be built into global ML models using, e.g., the Many-Body Tensor Representation. Properties of extensive and non-extensive models for the atomization energy are systematically explored by training on small molecules and testing on small, medium and large molecules.[1] Our results show that the non-extensive model is only useful in the size-range of its training set, whereas the extensive models provide reasonable predictions across large size differences. Remaining sources of error for the extensive models are discussed.
[1] ChemSystemsChem 2 (4), e1900052 (2020)