Reconstructing the Decision-Making Framework of Graph Neural Network Models: How GNNs Fine-Tune a Universal Chemistry Model

You are an archaeologist, standing at the site of an ancient civilization’s long-buried treasure. Among the artifacts, you uncover the core of their lost knowledge—a cryptic system of symbols and fragments. Scattered across the site are pieces of maps, along with intricate symbols and encoded messages that conveyed their understanding of the world. But your task goes beyond simply piecing these fragments together. You need to decipher how this civilization communicated, how they structured their knowledge, and how their knowledge-system organized the world around them.

This challenge is like reverse-engineering the decision-making framework of a "black box" model—a graph neural network (GNN). The GNN is our artifact, and it’s not enough to merely piece together weights, edges, and equations. We need to uncover the hidden logic and patterns that shaped its decisions about chemical data, much like deciphering the wisdom of an ancient civilization. So far, we’ve only glimpsed fragments of this hidden knowledge within GNNs. Let’s lay out our findings before we assemble the whole puzzle.

Fig 2 - Dimension-reduction on the 128-dimensional oxygen nodes in the datasets. For all the labels and in-depth explanation, see here.

In addition to the tidy clustering shown in Fig 2, illustrating the functional-group-associated decision-making, we also discovered that the functional groups (i.e., the atom vectors) are organized based on chemical similarity. In other words, the distance between two atom vectors—representing any two atoms from random molecules—indicates the chemical similarity between their surrounding functional groups. Simply put, the less similar the functional groups around two atoms, the further apart their atom vectors will be. The clusters are organized meaningfully!

This concept is illustrated in Fig 3, where we use an oxygen atom vector from a reference alcohol molecule. All other oxygen atom vectors are plotted at varying distances from this reference. By examining these distances, we find that greater distances correspond to oxygen atoms in environments less similar to the reference. It makes sense that oxygen vectors from primary alcohols are closest to our reference, since it is a primary alcohol itself. Following that, we see groups of secondary alcohols, tertiary alcohols, and hydroxylamines gradually becoming dissimilar.

Fig 5 - Illustration of the two phases of transfer learning.

There are several ways we can show the results of our global reconstruction of the atom vectors. We can first just show it quantitatively; by measuring how accurate we get our reconstruction of the atom vectors are compared with the original atom vectors, as we go through the update of our self-consistency procedure (as M increases). This is what the Table 1 to the side shows. There is actually a new concept that the Table 1 also employs which is the “depth.” This depth corresponds to the size of the neighborhood chosen. Our choice that only direct neighbors influence each atom is arbitrary, we could similarly choose the neighbors in the “street” over, or all neighbors 5 “bonds” over from the atom Expanding the neighborhood of influence for each atom helps a lot in capturing long-range correlations, and the table confirms this.

Fig 4 - The Euclidean distance between a reference oxygen on primary alcohol and the rest of the datasets AFV’s for oxygen atoms.

In a language model, words are organized in a way that captures relationships. For example, the direction from “King” to “Queen” is the same as from “Rooster” to “Hen” or “Man” to “Woman.” This pattern shows that the model has identified gender differences by making a consistent shift in the word vectors. Similarly, other directions capture shifts like singular to plural (“King” to “Kings”) or past to present tense. These relationships appear because certain words are used in similar contexts, with the main difference being gender or plurality, represented as a single, consistent shift in direction.

The same concept applies to our atom vectors in chemistry’s graph models—only here, the structure is even more organized! Just as a shift in a language model represents a change in gender or tense, shifts in atom vectors signify a change in the atom’s chemical environment, essentially a reaction. Each direction in the atom vector space represents a specific type of chemical change. For example, oxidation reactions push atom vectors in one direction, turning them from their reduced form to their oxidized form. This is shown in Fig 4’s 2D projection of atom vectors “undergoing” oxidation, this shift appears as a movement in one direction, while trivially reduction appears in the opposite direction. Any other reaction is some angled direction depending on its similarity with oxidation

This insight is essential because it explains how clusters of atom vectors are arranged (shown in Fig 2), both in the simplified 2D view and in the original, high-dimensional space. These arrangements, in turn, define the decision-making framework of the graph model. In short, the graph model “speaks” the language of chemical reactions: each unique reaction corresponds to a consistent direction in the atom vector space.

Now that we’ve laid out our fragments, how do we assemble them to reveal the full picture of the graph model’s understanding of chemistry—to map out its entire “knowledge space”? Here’s the plan: Fragments 1 and 2 give us our essential clues, while Fragment 3 showcases the powerful advantages we’ll use along the way.

Fragments 1 and 2 show that each atom vector is shaped entirely by its chemical environment. But if we think a little deeper, we realize something interesting: while the atom vector depends on the functional group environment, that environment also depends on the atom vector itself. This creates a classic chicken-and-egg problem, where each part defines the other and crucially is also defined by the other.

Solving this isn’t straightforward, as neither the atom nor its environment truly comes first—they determine each other in an interdependent cycle. The trick, then, is to start somewhere, anywhere, and let our initial guesses for the atom and its neighbors evolve together, like finding harmony between two interlocking dance partners.

We begin with a rough initial guess of the atom vectors across the entire molecule—our opening move in this dance. At first, the atoms and their neighbors aren’t perfectly aligned, but that’s about to change. After this initial move, each atom is updated based on the vectors of its direct neighbors. This update brings the atom vectors closer to harmony with their neighborhood. We can express this mathematically:

Fragment #3 — Atom Representations (May) Offer a Complete Description of Chemistry

Lastly We found out that the graph atom vectors can be used to transfer learn to a whole lot of chemical properties. We showed this on an atomistic level (pKa, NMR, electron occupancy) and on a molecular level (solubility).

Think of these atom vectors as "universal keys" crafted from our initial data, which can unlock doors to all kinds of new chemical properties. As Fig 5 shows, In our first phase, we molded these keys by training on energy data, making them finely tuned to fit certain molecular "locks." Once shaped, we found that these keys could open many other doors—unlocking predictions for things like acidity (pKa), magnetic resonance (NMR shifts), and even solubility—using just simple learning methods and fewer data.

This discovery shows that our graph model doesn’t just capture energy; it creates a master key that translates well across different properties, hinting that it could be a universal tool for understanding chemistry. Just as one well-crafted key can unlock many doors, our atom vectors could serve as a foundation to access a wide array of chemical insights.

Fig 1 - Archaeological site uncovering the core of an ancient civilization’s knowledge.

In a previous post, we introduced our primary artifact: the graph model, composed of nodes representing atoms, and edges signifying atomic interactions. The nodes and edges are both fine-tuned in the model (using a neural net) so as to make the best predictions on energy-labelled training data. By extracting the nodes (atoms) of the graph and analyzing them we uncovered how the model makes its decisions at the atomic level—a framework based on functional groups as can be seen from Fig 2’s t-SNE dimension reduction of the original 128-dimensional nodal vectors, which for simplicity will be called atom vectors throughout.

Fragment #1 — Atom Representations of a Graph Molecular Model are Based on Chemical Environment Around the Atom

This observation is our first hint that the clusters in Fig 2 aren't random but are quantitatively organized based on molecular geometry. Even within a cluster of atom vectors representing the same functional group—for instance, all primary alcohols shown in light green in Fig 2—the closer two atom vectors are, the more similar the the whole molecular environment around those atoms. If this all sounds new or unfamiliar, you might want to revisit our previous post for more context!

Fig 3 - The Euclidean distance between a reference oxygen on primary alcohol and the rest of the datasets nodal vectors for oxygen atoms. 

Fragment #2 — Atom Representations of Graph Model are Organized Based on The Language of Chemical Reactions! 

We then discovered a very crucial way of how our graph model organizes the functional groups of Fig 2—more specifically the 128-dimensional atom vectors from which the 2-dimensional projection of Fig 2 is derived from. They are organized based on the concept of chemical reaction language. To understand this, it is best to understand an analogous example which happens in natural language models like ChatGPT. Just like our atom vectors which represent atoms in the context of a molecule, in language models there are word vectors that represent words in the context of a sentence. It turns out that in those models, word vectors are meaningfully arranged based on relational semantic meaning.

Replicating the Graph Model Atom’s Decision-Making Framework—Reconstructing Atom Vectors

  • The starting point atom vector (initial guess)

  • Neighbor j’s starting point vector (initial guess).

  • Weighting coefficients that will determine the effect neighbor j has on target atom i. This will be fitted with data (supervised learning)

  • Sum over all neighbors contributions in the molecule influencing changes on atom i. In other words, this is environment nudging atom i’s crude initial guess to be more harmonious with it

  • Our atom vector updated based on the identity of our initial neighboring vectors

With this first update complete, we repeat the process. Now that all atoms have been adjusted based on their immediate neighbors—who were initially just estimates—we have a refreshed set of neighborhoods. Using this updated information, we refine each atom vector again. This back-and-forth dance continues, where each round (or "Mth update") is based on the previous neighborhood (M-1), gradually refining both the atom and its environment. The equation below follows the same form as before, simply replacing (1) and (0) with (M) and (M-1) to represent each successive refinement.

Fortunately, this process doesn’t go on indefinitely. After a few rounds of updates—usually around three to four (M=4)—we reach a point where further adjustments make no meaningful difference. At this stage, each atom vector is in perfect harmony with its neighbors, fully defined by them and defining them in turn. Scientists call this “self-consistency,” but in the dance world, it’s simply a flawless 10/10! We’ll show in the results below how this convergence typically occurs by the fourth round of updates.

Reconstructing the Atom Representation of a Graph Neural Network Model

Table 1 - root mean square error (RMSE) of our reconstructed atom vectors vs. the original atom vectors, over several iterations of our self-consistent scheme (down), and with increasing neighborhood depth (across). To give a sense of how accurate these numbers are, consider that the magnitude of an atom vector is on average around 12.9 units, thus error is within 1.5% of that.

Table 1 shows that using our reconstruction procedure we immediately obtain a very close accuracy even at the first refinement (M = 1). To see this, consider that the average magnitude of an atom vector is around 12.8 units, a 0.594 error is a tiny fraction compared to that magnitude. And this is improved significantly with greater M and neighborhood depth, ultimately reaching 0.170, which is within 1.5% of the average magnitude of the atom vector.

Rather than just seeing how accurate our reconstruction is in bulk quantitative terms, alternatively we can check how our reconstructed atom vector does on real chemical properties through transfer learning. Remember, that from a previous post and recapped here as “Fragment 3,” the atom vectors act like “master keys” towards prediction of many chemical properties. If our reconstructed atom vector is to be judged for how close it is to the true vector, we should check how well does it perform on predicting these chemical properties, on acting like that “master key.” We start first by showing carbon NMR as an example. Below is the performance of our reconstructions of the from M = 1 to M = 3 and comparing it with the performance of original atom vectors (fourth figure, bottom right) on carbon NMR predictions. As with most of these performance figures, the closer the datapoints are to the diagonal line, the more accurate the performance of the model, because it means that the predicted values from the model are equal to the true values from the dataset.

Fig 6 - The performance of our reconstructed atom vectors using the self-consistent scheme described above, over successive refinements (M) of the procedure. The zeroth update, i.e. initial guess of our procedure, is the average carbon vector, which fittingly gives the average carbon NMR prediction shown by the black line. Then the first, second, and third updates are shown in separate figures along with the performance of the original atom vectors (non-replicates) on carbon NMR. We can see that over the successive refinements (as M increases), we start to resemble the same performance profile on carbon NMR for the original atom vectors (bottom right).

The figure also shows a black line representing the zeroth update—our initial crude guess. This initial guess was chosen to be the average carbon atom vector for carbon atoms, which fittingly corresponds to the average value for carbon NMR. This choice is particularly useful as it provides a clear starting point and allows us to observe how the process of reconstruction unfolds.

After the first refinement (M = 1), this single average begins to evolve. The model incorporates information from the carbon atom's neighbors, capturing details about its functional group through interactions with these neighbors. However, at this stage, the neighbors themselves do not interact with one another. As a result, the model recognizes the carbon's functional group but lacks information about the functional groups of its neighbors.

It is only after the second refinement and beyond (M > 1) that the neighbors also begin to refine their own functional group information. This refinement is then shared with the central carbon, creating a richer and more accurate prediction. During this process, NMR values are “smeared” across the profile, evolving from simple averages to detailed representations. They now capture the subtle differences that arise from long-range structural effects and interactions within the molecule.

This step-by-step approach reveals how the model evolves from simple average vectors to intricate representations of molecular structures. It demonstrates a transparent progression, breaking down the complex prediction process into understandable and interpretable stages—no longer a black box.

Likewise, as we said, these atom vectors act as “master keys” to many chemical properties. So far, we only showed carbon NMR. Below we show the same analysis for another property the total electron density around the carbon atom. Again, exactly the same analysis done above can be done here. The only difference to note with the diagram below is that the electron density groups (colors) are not even arranged in the same way as the NMR groups, there is only weak correlation between these two properties, and yet the model can reconstruct both of them! 

Fig 7 - The prediction performance of our reconstructed atom vectors on carbon’s total electron density, over successive refinements (M) of the reconstruction procedure. The zeroth update, i.e. initial guess of our procedure, is the average carbon vector, which fittingly gives the average carbon total electron density prediction shown by the black line. Then the first, second, and third updates are shown in separate figures along with the performance of the original atom vectors (non-replicates) on carbon’s total electron density. We can see that over the successive refinements (as M increases), we start to resemble the same performance profile on carbon total electron density for the original atom vectors (bottom right).

Rather than showing the performance of all the atom vectors we reconstructed, we can illustrate a how a single atom vector prediction is reconstructed through our procedure. We show this pKa—yet another property that can be unlocked with our atom vector “master keys”—for the trifluoroacetic acid molecule shown below.

Fig 8 - Reconstruction of the atom vector for oxygen and subsequent prediction of pKa. With each refinement, a deeper neighborhood is integrated into the atom vector reconstruct and the pKa starts to accurately give the trifluoroacetic acid’s pKa.

Using a neighborhood depth of 1, we illustrate how initially at the first refinement (M = 1), when only the direct neighbors are recognized, the average prediction is fittingly close to that of an average pKa of an alcohol. Then, at the second refinement, the neighbor carbon to the oxygen (from which the prediction is obtained) is recognizing it is part of a carboxylic acid group, this brings the prediction of our reconstructed atom vector to be close to the pKa of a carboxylic acid. It is only after the third perturbation, do the trifluorinated carbon is recognized which if you remember your chemistry lessons, this group pulls on electron from the acid to make it more acidic, driving the pKa down to 0.85, close to the actual pKa of trifluorinated carboxylic acid. 

Or we repeat for other atom vectors, below is the same illustration but for hydrogen and carbon NMR of 6H-cyclopenta[b]furan. Below we do not show averages of each subgroup realized per refinement but just how accurate the prediction evolves to be very close to the true prediction by the third refinement (M = 3), but we expect the same behavior that was already shown through the performance figures on NMR.

Fig 9 - Reconstruction of the atom vector for carbon and hydrogen and subsequent prediction of carbon and hydrogen NMR. With each refinement, a deeper neighborhood is integrated into the atom vector reconstruct and the NMR starts to accurately give the true NMR for the atom.

Conclusions

We have shown how information from a graph neural network can be deciphered and reconstructed, just like excavating a hidden world of knowledge inside a buried treasure. By using the fact that the atom vectors are playing a little dance with their environment—surrounding atom vectors—such that neither the atom vector nor the environment vectors can be determined independently, but they “go together” and have to be determined together. We devised a scheme, whereby starting from any guess, the environment and atom vectors update each other until the atom vector fits its environment and its environment fits the atom vector. This allowed us to reconstruct the whole map of atom vector possibilities depending on the various molecular structures available in our data and essentially reveal the intricate atom-environment dependencies. Then we showed how just as expected these reconstructed atom vectors act like “master keys” towards prediction of many chemical properties, providing a well-rounded understanding of all of chemistry.