How Graph Neural Networks Model Chemistry

How do Graph Neural Networks Model Chemistry?

Imagine you’re baking a cake, but instead of following a recipe, you're trying to figure out which ingredients work best by guessing and experimenting. It might take a few tries (and some pretty bad cakes), but eventually, you start to recognize patterns: more sugar makes it sweeter, and too much flour makes it dry. Now, imagine if you could spot these patterns instantly, without any trial and error—that's the kind of insight AI brings to chemistry. The real advantage? It doesn’t stop at finding patterns; it enables us to predict behavior and even design molecules tailored to our specific needs.

Chemistry, like baking, is all about structure. Atoms are the ingredients, and the way they’re arranged in a molecule determines the final product. For centuries, chemists have used these molecular structures to predict how substances will behave in reactions, relying on cause-and-effect models. When you know how atoms are connected, the type of bonds they make, you can figure out what happens next. This works because chemistry, at its core, follows rules we’ve spent years mastering

But here’s the twist: what if, instead of relying only on these rules, we could find hidden patterns in chemistry’s data? Patterns no human has noticed before. That’s where probabilistic models and neural networks come into play. These AI models don't just rely on the well-established theories—they learn from vast amounts of chemical data to make predictions that can surprise even the experts.

Fig 1 - Sketch of a scientist baking with an experimental setup

Neural networks, a type of AI inspired by the human brain, can take on complex problems without needing step-by-step instructions. When we feed these models molecular structures—the detailed maps of how atoms are arranged—something powerful happens. They learn, adapt, and eventually make predictions about chemistry at a level we once thought was only possible with years of expertise

Neural Networks Whipping Up Chemistry — What the Heck do They Learn?

We already see neural networks in action—self-driving cars, voice assistants, language models and countless other technologies rely on them. Now, their power is extending into chemistry, where they’re being used to tackle incredibly complex problems, much to the excitement of researchers. But, just like not every chemist gets their recipe right, or can be trusted for that matter (looking at you, Walter White!), so how can we trust AI models to reliably solve the intricate puzzles of chemistry?

That’s where explainability steps in. It's not enough for AI to bake up the perfect solution; we need to know exactly how it got to the solution and the framework behind its problem solving. Only then can we trust these systems as much as a seasoned chemist.

Unfortunately, neural networks are designed in a way that puts this trust into serious question. These models mimic the brain by using neurons—tiny problem solvers that pass information along and work together to solve problems. But how many of us know how our own brains operate? I can’t even remember if I... turned off the oven this morning! But hey, our neurons keep working anyway, solving life's problems—just like neural networks, though we have no clue how they're doing it!"

Here’s our approach to explaining the decision-making, or the explainability problem, of GNNs. We focus on the fine-tuned nodal (atom) parts of the graph model, as illustrated to the left. Fig 4 is the same figure as Fig 3, but now outlining the fine-tuned nodes in red. Remember, in both SchNet and AIMNet, these nodes are fine-tuned based on their connections (edges). We extract these nodal representations for each atom per molecule in the entire dataset and refer to them as atom feature vectors (AFVs)—which have been updated to reflect their edge-based interactions.

These AFVs contain numerical data that the network uses to predict a molecule’s properties, such as its total energy. Essentially, these vectors are the building blocks that drive the network’s decision-making of molecular behavior.

Fig 3 - Diagram of a graph neural network (GNN) learning based on raw molecular structure. In short, GNNs work by fine-tuning individual atoms as nodes, and the interactions between them as edges.

But here’s the big problem: these AFVs are cryptic numerical data—essentially lists of information—that represent each atom in the molecule. Think of them as the GNN’s "thoughts" about each atom: how it behaves, how it interacts with other atoms, and how it contributes to the overall molecule. However, this all happens in a framework that the neural network understands—not one that we do. It’s like the inner workings of a master chef: their brain firing the right signals, fine-tuning each decision, and performing the right actions. And much like a chef, even they can’t fully explain how they achieve their mastery.

And here’s another huge problem: these AFVs are packed with details—128 parameters worth for each atom! That’s a lot of information and trying to analyze it directly is impossible—it’s just too overwhelming. It’s like analyzing the human brain for decisions. So, we need to simplify things, and we do that with a tool called t-distributed stochastic neighborhood embedding (t-SNE). This technique reduces the complexity of the data from a 128-D space to a 2-D, while losing as little information as possible. This helps us to visualize the data in a way that’s easy to digest.

t-SNE does this using distances; by measuring the distances between various AFVs and then mapping those distances to a 2-D space, we can organize the high-dimensional space onto a 2-D shadow projection, which allows us to glean at how the core knowledge of the GNN is organized. Note, however, this is just a shadow of the original information stored, and so we always have to use caution when analyzing the results of these techniques.

Once t-SNE works its magic, something pretty cool happens (see figure below): the AFVs we pulled out for each atom per molecule in the dataset organize themselves based on molecular substructures or functional groups—kind of like how you’d group together ingredients that serve similar purposes in a recipe (flour, sugar, and butter for cakes; spices and herbs for savory dishes). The network has learned to recognize these chemical building blocks and group them, accordingly, giving us a glimpse into how it models chemistry and predicts molecular behavior. For example, in one place, all the alcohols—primary, secondary, and tertiary—are grouped together, in another place, all the carboxylic acids, carbonates, carbamates, and so on. We show this for SchNet here, but in our article [1], we show it for AIMNet as well.

Explainability Problem: How Do Graph Neural Networks Learn Chemistry?

Fig 4 - GNN diagram. Highlighting (in red) the nodal (atom) feature vectors (AFVs) for every atom per molecule. We extract these AFVs as part of our approach to explainability of GNNs.

Graph Neural Networks: Baking with Atoms & Bonds

If baking a cake was tricky before, imagine now you need to figure out how every ingredient interacts down to the smallest detail to create the perfect result. This is exactly how Graph Neural Networks (GNNs) approach chemistry—they don’t just look at the package labels (compound names/formulas), nor the raw ingredients (atoms), nor their flavors (chemical properties); instead, they analyze the entire recipe (molecular structure), considering how atoms connect and interact to predict molecular behavior and properties. Traditional neural networks cannot do this, they often work on simpler representations of chemistry, such as tabular values of atomic or molecular properties, or compound formulas which are akin to using package labels to summarize the molecule as it doesn’t capture its entire dynamics.

Here’s how it works: in a GNN, each atom is a node, and the interactions between them are the edges. Both nodes and edges are fine-tuned to figure out how these atoms combine to determine a molecule’s properties. After all, it’s the structure of a molecule that gives it its unique flavor—just like how the right balance of ingredients makes the cake perfect. Graph neural networks are based on this graph structure (nodes & edges), but the exact algorithm and models implemented that use this graph structure modelling can vary widely.

Take SchNet [2-5] and AIMNet [6] for example—both are graph neural networks, but how they fine-tune the nodes (atoms) and edges (bonds) is strikingly different. SchNet operates like a message-passing neural network. It focuses on learning the relationships (edges) between atoms, passing information between them to understand the molecule's behavior. There’s a lot of math involved, but the key idea is that the atoms (nodes) need to interact, constantly communicating to build an understanding of the whole molecule. (I’ll leave the detailed citations [2-5] for those interested in diving deeper!)

AIMNet [6], on the other hand, uses priors that establish a graph structure for the molecule before feeding it into the algorithm. The network then learns to fine-tune nodes and edges and focus on the most critical parts of this graph structure.

If we take this to the kitchen, SchNet is like a baker carefully coordinating between ingredients: the flour informs the butter how to blend, and the sugar adjusts based on the flour’s texture. AIMNet, however, behaves more like a chef with laser focus, zoning in only on the key elements of the recipe.

However, we’re still faced with the challenge of explainability. Just like a chef’s creative genius is hidden beneath layers of experience, we understand what these models do with the data (nodes & edges), but not always why they make certain decisions. What is their underlying framework? What drives their choices? Can we somehow "peek inside" their decision-making process and uncover the logic behind their predictions?

Of course, I’m talking about GNNs here—not chefs—and trying to do it ethically! 😉 So, we've trained both graph neural networks, SchNet and AIMNet, using a vast amount of organic chemistry data from the QM9 dataset [8]. And our job is to explain every decision these algorithms make.

Fig 1 - Diagram of a basic neural network

So, to understand how a neural network operates is a complex task on par with understanding neural signals in the brain, albeit less complex as they are man-made. Naturally, just as in the human brain, one way to understand what is going on is to break down the neural signals firing, where are they coming from? where they are going? what they are activating? This way we can take a peek at what the AI "thinks" before it solves a problem. Is it considering the right chemical interactions? Can we connect its thought process to deeper scientific theories? This transparency is what allows us to trust AI’s decisions, much like how you might trust a baker who explains every step of the recipe.

But here’s the tricky part: the ingredients these neural networks use vary widely, just like how chemistry itself can be understood in many ways. They might use compound names, chemical properties, or even textual descriptions. However, the secret ingredient that has worked best so far is the actual molecular structure—the arrangement of atoms and bonds.

Why? Because the structure of a molecule is the key to understanding its behavior, just like how the right ratio of ingredients can make or break your cake. Many of our chemical theories are grounded in molecular structure, and by feeding neural networks this information, they can make predictions that are on par with the most advanced theories out there.

But that’s not the only tool in our kitchen! We also use Principal Component Analysis (PCA) [9] to reduce the dimensionality of the data. Think of PCA like looking at the entire 128-D space, through a 2-D window from different angles, to find the best window that shows off its the most of the 128-D space’s structure, the window that packs the most data. PCA is especially helpful because it maintains the original distances between data points—so when two molecules are close in this reduced space, it means they’re chemically similar, just like how closely related ingredients might perform similar roles in different recipes. In contrast, t-SNE, which is based on distances in 128-D, in its current implementation, distorts distances in 2-D projection for better visuals.

Fig 5 - t-SNE’s 2-dimensional projection of the 128-dimensional oxygen atom feature vectors (AFVs) of the SchNet[2-5] graph neural network, while being tested on QM9 dataset.

Fig 6 - PCA’s 2-D projection of oxygen atom feature vectors and zoom in onto distances between alcohols.

For example, PCA on oxygen AFVs reveals that straight-chain alcohols of increasing length tend to cluster closer and closer together in the 2-D space. For example, butanol and pentanol are much closer together than methanol and ethanol. This can also be revealed by analyzing the distances between the original 128-dimensional vectors.

This tells us that the GNN fine-tunes atoms (nodes) on a very resolute level, keeping long-range information about the molecule into account. However, the long-range information is decreasingly important, as for most decisions, it has to focus on the local aspects of oxygen’s AFV. Butanol is very similar to pentanol, and so these two molecules are placed very close together. The same cannot be said about methanol and ethanol as local aspects are given much more weight than long-range features.

We can take things further by using a reference molecule’s oxygen AFV—let’s say the oxygen in prop-2-yn-1-ol—as a baseline and comparing with all the other oxygen AFVs in the dataset using Euclidean distance. It’s like comparing one key ingredient to the rest of the pantry to see how other oxygen atoms in different molecules measure up. When we do this, we find logical groupings, like a small peak representing alpha-positioned alkynes (those with oxygen attached to a triple bond) and meaningful clusters based on whether the molecule is a primary or secondary alcohol. Just like how a baker learns to categorize ingredients by their roles (salt next to pepper), the network does the same with molecular substructures.

Fig 7 - Euclidean distance distributions from oxygen feature vector of prop-2-yn-1-ol as a reference to all other oxygen feature vectors in QM9. Distributions labelled using same labeling scheme as Fig 4.

One challenge with visualizing high-dimensional spaces is that these visuals can sometimes be misleading. Since we’re projecting complex, multidimensional data onto a simpler space, some information will inevitably be lost or distorted. What we see is always a "shadow" of the full picture. To trust these low-dimensional projections as accurate representations of the true clusters in high-dimensional space, we need a way to quantify how well atom feature vectors align with the functional group concept. This helps ensure that the clusters we observe—those defined by functional groups—are genuinely separated into resolute clusters in the original 128-dimensional space. In other words, if there are really groups of distinct flavors of cake, in our baking analogy.

One effective way to achieve this is through Linear Discriminant Analysis (LDA) [10]. LDA helps us check the separation of these "ingredients" (functional groups) in high-dimensional space by drawing clean, linear boundaries that separate and distinctly classify each functional group—think of it like slicing through the 128-D space to see exactly how its divided in terms of functional groups. If it can parse the space in a way that maximizes the classification between different functional groups, then we can say there are clear boundaries that separate functional group associated clusters in the 128-dimensional space. In our LDA model, we indeed achieve near 100% accuracy in classifying all functional groups, meaning the AFVs are beautifully organized, with clear spatial boundaries between the clusters, each cluster representing a different functional group.

Are the Ingredients Really Grouped by Flavor? Quantifying Functional Group Clustering

Fig 8 - LDA confusion matrix on the atom feature vectors classifying functional groups.

Conclusions

For those interested in diving deeper into the details, all connecting citations and results discussed in this post can be found in our full article: [1]A.M. El-Samman, I.A. Husain, M. Huynh, S. De Castro, B. Morton, and S. De Baerdemacker. Global geometry of chemical graph neural network representations in terms of chemical moieties. Digital Discovery, 3(3):544–557, 2024 This reference provides a comprehensive exploration of how GNNs learn chemistry based on the functional group concept, explains much of the methodology and analysis in this post in much greater detail. Here are some of the other relevant references to this post:

[2] K. T. Schütt, P.-J. Kindermans, H. E. Sauceda, S. Chmiela, A. Tkatchenko and K.-R. Müller, arXiv preprint arXiv:1706.08566, 2017.

[3] K. T. Schütt, F. Arbabzadah, S. Chmiela, K. R. Müller andA. Tkatchenko, Nature communications, 2017, 8, 1–8.

[4] K. Schutt, P. Kessel, M. Gastegger, K. Nicoli, A. Tkatchenko and K.-R. Muüller, Journal of chemical theory and computation, 2018, 15, 448–455.

[5] K. T. Schütt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko and K.-R. Müller, The Journal of Chemical Physics, 2018, 148, 241722.

[6] R. Zubatyuk, J. S. Smith, J. Leszczynski, and O. Isayev, Science Advances 5, eaav6490 (2019).

[7] L. Van der Maaten and G. Hinton, Journal of machine learning research, 2008, 9, year.

[8] R. Ramakrishnan, P. O. Dral, M. Rupp and O. A. Von Lilienfeld, Scientific data, 2014, 1, 1–7.

[9] H. Abdi and L. J. Williams, Wiley interdisciplinary reviews: computational statistics, 2010, 2, 433–459

[10] A. M. Martinez and A. C. Kak, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 1, 228-233

We’ve explored how graph neural networks like SchNet model chemistry by identifying key chemical structures as the "ingredients" for making predictions. Atom feature vectors (AFVs), trained on organic chemistry data, naturally form distinct clusters that reflect functional group environments. Like flavors in baking, similar functional groups are grouped together, influencing the overall behavior of the molecule.

Linear discriminant analysis confirms that these functional groups are clearly separated in high-dimensional space, allowing for near-perfect classification. The organization of these clusters provides a new, quantitative measure of molecular similarity, how far any two molecules are from each other, like how far any two cakes are from each other—a whole field of its own!

This foundation opens up exciting opportunities for further exploration. In a future post, we’ll delve into how these AFVs store enough chemical information to be used in transfer learning, enabling solutions to new challenges in chemistry. Another post will reveal how these AFVs map onto reaction formulas—yes, it’s not just about functional groups! There’s more to uncover… cue the dramatic cliffhanger!

Finally, we’ll tie everything together with a comprehensive explanation of how GNNs model chemistry by replicating the entire AFV framework—and, in doing so, mimicking the GNN's decision-making process itself and showing how it produces a universal and transferable model of chemistry.

References

The figure to the side shows the confusion matrix of the LDA model classifying oxygens based on their atom feature vectors (AFVs). In a confusion matrix, the true FG class is given on the Y-axis, and the predicted on the X-axis. If the result is a solid black diagonal, as ours is, it means that every predicted FG class is matched up with the true FG class in the entire data. In other words, the model perfectly classifies the AFVs. If the predicted does not match the true FG class, then the black line will appear on the off diagonal. This does not appear in our AFVs classification. In other words, each AFV is correctly matched to the functional group surrounding its associated oxygen atom. The LDA can draw linear boundaries to separate functional groups in the 128-dimensional space.

This result reassures us that the AFVs aren’t just random numbers—they are well-structured representations that reflect the underlying chemistry. The model’s high accuracy in identifying functional groups shows that the clusters in the 128-D space are resolutely associated with functional groups, without overlap or mixing. This means our PCA dimension reduction of Fig 6 is misleading, as it had to compress the AFV data in such a way that it over clustered the original 128 dimensions, due to attempting to look at all these dimensions through a 2-D window. In the original 128-D AFV space everything is clearly separated based on functional groups, much like t-SNE projection shows in Fig 5. Only through the LDA results in Fig 8 were we able to confirm that the 128-D space is highly resolute and faithfully represents functional group distinctions.