Loci (2008)
Chemical Graph Theory, Kimberly Jordan Burch

## 1. An Introduction to Chemical Graph Theory

Chemical graph theory is a branch of mathematics which combines graph theory and chemistry. Graph theory is used to mathematically model molecules in order to gain insight into the physical properties of these chemical compounds. Some physical properties, such as the boiling point, are related to the geometric structure of the compound. This is especially true in the case of chemical compounds known as alkanes. Alkanes are organic compounds exclusively composed of carbon and hydrogen atoms. One example of an alkane is ethane, shown in Figure 1 . Each carbon atom has four chemical bonds and each hydrogen atom has one chemical bond. Therefore, the hydrogen atoms can be removed without losing information about the molecule. The resulting representation of ethane is the carbon tree shown in Figure 2 . This carbon tree can be represented as a graph by replacing the carbon atoms with vertices. Chemical bonds are then represented as an edge in the graph. Figure 3 shows the graphical representation of ethane composed of two vertices connected by a single edge.

Figure 1. The ethane molecule

Figure 2. The carbon tree of ethane Figure 3. The carbon tree of ethane as a graph The structure of an alkane determines its physical properties. Physical properties of alkanes can be modeled using topological indices. Some of these indices are well known outside of the chemical and mathematical communities such as the relative molecular mass (Mr) of a compound. For alkanes, the relative molecular mass is a function of the number of carbon atoms, denoted by n, and is given by Mr(n) = 12.01115n + 1.00797(2n + 2) atomic mass units (amu). Using this formula, you can determine that the relative molecular mass of ethane in Figure 3 is 30.0701amu. As previously mentioned, the boiling point of alkanes is determined by the geometric structure of the alkane. Boiling points are a measure of the forces of attraction between like molecules. For essentially nonpolar compounds such as alkanes, these forces are London dispersion forces due to instantaneous dipole-induced dipole attractions. Dispersion forces are very short range forces which increase with the number of electrons which is proportional to the relative molecular mass for the alkanes. The alkane boiling point should depend on the relative molecular mass and on how well the molecules pack together, which is related to the geometry of the molecule. The dependence on the geometry is complex, but the boiling point should decrease in a general way as the compactness of the molecule increases if the relative molecular mass stays the same. Balaban noted that for the same relative molecular mass, the boiling point decreased with increased branching (Balaban, 2001).

We examine the structures and boiling points of octane and 2,2,4-trimethylpentane to illustrate this result. Both are composed of 8 carbon atoms so they have the same molecular weight. Octane is composed of a "straight" chain of 8 carbons and is known as a normal alkane. A 3D representation is given in Figure 4 and the graphical representation is given in Figure 5.

Figure 4. The octance molecule

Figure 5. The carbon tree of octane 2,2,4-trimethylpentane is a more compact alkane and is sometimes called isooctane. A 3D representation is given in Figure 6 and the graphical representation is given in Figure 7.

Figure 6. The isooctane molecule

Figure 5. The carbon tree of isooctane From the above discussion, we expect the boiling point of 2,2,4-trimethylpentane to be lower than that of octane. This is indeed the case. The boiling point of octane is 398.7 K while the boiling point of 2,2,4-trimethylpentane is 372.4 K. It is possible to model the boiling point of families of alkanes having similar geometric structure using molecular weight as the only index in the model (Burch, Wakefield, Whitehead, 2003). In modeling the alkanes in general, more topological indices are needed to reduce the error in the model. Some examples include the Hosoya index, the Wiener number, the Wiener path numbers, the Mean Wiener index, and the Methyl index.

The Hosoya index (denoted Z) is the sum of the coefficients of the simple matching polynomial for a graph. This is equivalent to the number of matchings a graph contains plus 1 to account for the matching consisting of no edges. A matching of a graph G is a (possibly empty) set of edges of G in which no two edges share a common vertex. The set of edges in a matching are said to be independent. An algorithm for computing the simple matching polynomial of a graph is given by Farrell (Farrell, 1979). The Hosoya index for ethane is 2 since it contains only one edge yielding one matching with zero edges and one matching with one edge. Using this algorithm we determine the Hosoya index for 2,2,4-trimethylpentane is 19. You can verify this index by carefully inspecting all the sets of independent edges. There is 1 way to choose zero edges, there are 7 ways to choose only one edge in the matching, 11 ways to choose two edges in the matching, and there is no way to choose three or more edges for a matching. This gives, 1 + 7 + 11 = 19 simple matchings of 2,2,4-trimethylpentane verifying the Hosoya index for this alkane.

The Wiener number (denoted W) is the sum of the distances between all pairs of vertices in a graph. It can be computed by adding the entries in the upper (or lower) triangular part of the distance matrix of a graph. Ethane has a Wiener number of 1 since it has only one pair of vertices separated by an edge. For 2,2,4-trimethylpentane, we use the distance matrix in Figure 9 computed from the labeled graph in Figure 8.

Figure 8. The distance graph for isooctane Figure 9. The distance matrix for isooctane The Wiener number of 2,2,4-trimethylpentane is therefore 66. The Wiener path numbers (denoted 1P, 2P, 3P, 4P, ...) are defined by iP which is the number of pairs of vertices in the graph separated by i edges. iP can be computed using the distance matrix of a graph and counting the number of times i appears in the upper triangular part of the matrix. Using the distance matrix for 2,2,4-trimethylpentane given in Figure 9, we find that 1P = 7, 2P = 10, 3P = 5, and 4P = 6. The Mean Wiener index (denoted W) is the average of the distances between all pairs of vertices in a graph. For a graph having n vertices, We previously showed the Wiener number for 2,2,4-trimethylpentane is 66. Using this, we calculate that W = 66/28 = 2.35714.

The methyl index was introduced in (Burch, Wakefield, Whitehead, 2003) to help graphically represent the branching of the alkanes. The methyl index (denoted Mth) is defined to be the number of degree one vertices which are adjacent to a vertex of degree three or greater. This is not equivalent to the number of methyls in the IUPAC name. For example, the methyl index for 2,2,4-trimethylpentane is 5. This differs from the 3 methyls present in the IUPAC name. The five methyl edges as seen in Figure 8 are (1, 2), (2, 6), (2, 7), (4, 5), and (4, 8). All of the indices described can be used to construct models of the various physical properties of alkanes.

Several of these indices were used in (Burch, Wakefield, Whitehead, 2003) to model the boiling points of alkanes having six to twelve carbon atoms. The normal alkanes with thirteen through twenty-two carbons were also included to facilitate the prediction of test data having thirteen to twenty-two carbons. The total number of alkanes modeled was 187. The nonlinear models were formed using the NEOS solver FILTER available online from Argonne National Laboratory (Fletcher, 2002). One such model is

f(1P, 2P, ..., 6P, Mth, Z) = 847.41474 + 221.61698 (1P)0.49420 − 1182.20853 (2P)0.03689 + 0.00125 (3P)3.39724
3.02445 (4P)0.93751 − 2.16070 (5P)1.01631 − 0.56366 (6P)1.38233 − 2.10575 Mth0.5695 − 9.61075 Z0.19907

The coefficient of determination for this model is 0.997068 and the standard deviation is 2.1 degrees (C). Table 1 gives the number and percentage of alkanes with the specified absolute boiling point deviations given by this model./p>

Table 1. Boiling points of alkanes
BP dev. # alkanes % alkanes
0 - 1° 84 44.9
1 - 2° 48 25.7
2 - 4° 46 24.6
4 - 6° 5 2.7
6 - 9° 3 1.6
> 9° 1 0.5

Figure 10 shows a plot of the experimental boiling points versus the model boiling points determined by this model. The straight line represents an exact prediction. An Excel spreadsheet containing the indices and boiling point data is available for the alkanes modeled in bpdata.

Figure 10. Boiling points of alkanes Burch, Wakefield, and Whitehead also gathered boiling point data for 52 additional alkanes having between thirteen and twenty-two carbons (Burch, Wakefield, Whitehead, 2003). This data is available in the spreadsheet predictdata. Figure 11 shows the experimental boiling points of these 52 alkanes versus the boiling points determined by the above model. The straight line represents an exact prediction.

Figure 11. Boiling points of alkanes I also have used many of these indices to model the melting point of alkanes although this task proves to be more difficult than modeling the boiling point of the same set. Boiling points are mainly determined by the London dispersion forces between molecules in the liquid form. Many more factors are involved in determining the melting point of alkanes since the molecules are solids and have a rigid three-dimensional structure. A more restricted set of alkanes, such as the single methyl alkanes studied in a previous work (Burch, Whitehead, 2004) allows for the construction of models similar to those found for the boiling point of alkanes. Melting point data was available for 69 of the 80 alkanes having between 10 and 20 carbon atoms and a single methyl group. The data is available in the spreadsheet mpdata. The melting points are given in Kelvin (K). A nonlinear model was computed using 62 of the 69 alkanes with the remaining seven being randomly chosen to use as predictive data. The NEOS solver FilterSQP from Argonne National Laboratory was employed to form this model. The model used the first four Wiener path numbers and the Mean Wiener index:

f(1P, 2P, 3P, 4P, W) = 1625 + 2971(1P)0.1428 − 4460(2P)0.05842 − 236.7(3P)0.3554 − 0.01762(4P)2.727 + 16.11 W 1.047

The coefficient of determination for this model was 0.97659 and the standard deviation was 5.1K. Figure 12 shows a plot of the experimental melting points of the 62 alkanes versus the model melting points determined by the given model. The straight line represents an exact prediction.

Figure 12. Melting points of alkanes Table 2 illustrates the predictive ability of this melting point model for the given set of alkanes which was withheld from the original data set. Two of these alkanes have a high error in their modeled melting points. This prompted further study into how the melting point of the alkanes differ depending on whether the alkane had an even or odd number of carbon atoms. We therefore divided our data into two sets of an even and odd number of carbon atoms to account for this difference. Subsequent models had much better errors and predictive errors; the majority of the modeled withheld data had errors less than 2K (Burch, Whitehead, 2004).

Table 2. Melting points of alkanes
Alkane Tfus /K Tfus;cal /K δ T/K
4-methylnonane 174.5 178.0 3.5
3-methylundecane 216.9 207.6 9.3
5-methyldodecane 203.9 214.9 11.1