Loci (2008)

Chemical Graph Theory, Kimberly Jordan Burch

I wanted to share my research with my students which can sometimes be a challenging task for the professor. How does one present his or her research in a comprehensible and interesting manner? My approach to this situation was to get the students involved. After sharing the highlights of my research via a Power Point presentation, I asked the students to construct models using techniques similar to the ones I employed in my research. The students used Microsoft Excel to find a regression line of best fit to given melting point data for alkanes. The regression formula was then used to predict the melting points of withheld data. Students examined their models and evaluated how well they predicted the missing data.

This project was intended for a single lecture so I chose the normal alkanes which can be modeled with a single index. If more time was permitted for this project, students could model various sets of alkanes using any of the above indices. The Regression feature in Microsoft Excel located under the Tools menu in Data Analysis could be used to determine the coefficients of the model. Also, graphs similar to those described above where the experimental boiling (melting) point is plotted versus the modeled boiling (melting) point could be created to demonstrate how well the students' models fit the data.

Students developed melting point models of the normal alkanes based on their diameter. The objective of these models was to predict the melting points of normal alkanes for which no melting point data exists. The students were given melting point data for a set of normal alkanes in a Microsoft Excel file. This data is available in the spreadsheet mpstudentdata. A random number generator was used to select a subset of these data points to withhold as predictive data. The students then used their models to predict the data which were withheld and examined the error obtained in this calculation by comparing the predicted and actual boiling points. Finally, students were asked to complete a set of questions about the project. Both their models and answers were due in one-week's time.

Alkanes which are a straight chain of carbon atoms such as ethane are called normal alkanes. A large amount of physical data is available for normal alkanes. The graphical representation of normal alkanes is known as a path, often one of the first graphs studied in graph theory. Due to the availability of physical data for normal alkanes and the simplicity of their graphical structures, I chose the normal alkanes as the molecular data set for the student models.

Students constructed two diameter-based melting point models of the normal alkanes using Microsoft Excel. By inserting a chart in their spreadsheets, they were able to use the Trendline option in Excel to give the regression formula which best fit the diameter of the alkane against its melting point. For normal alkanes, the diameter is equal to the number of edges in the alkane's graphical representation. Melting point data was provided for the normal alkanes having between five and twenty-five carbon atoms from the NIST Chemistry WebBook. All melting point data in this project was given in Kelvin (K). The objective of this project was to use the models to predict the melting points for normal alkanes having more than twenty-five carbon atoms, for which no experimental data may exist. To measure the predictive ability of their models, students needed a predictive set of data. They obtained this set by randomly selecting five data points from the given set of alkanes. Students used the RAND function in Microsoft Excel to generate the set of random numbers within the desired range.

Laptops were distributed in the classroom the day the project was presented. A detailed set of instructions on how to use Microsoft Excel to create the models was also given to each student. Most students were familiar with Excel and felt comfortable trying out new commands they encountered on the instruction sheet. For example, many students had never used the Chart and Trendline options but had little trouble following the instruction sheet. I also walked around the classroom as they began their projects, providing assistance as needed.

Students used the random number generator command in Excel to choose five alkanes to withhold as predictive data. The remaining 16 data points were plotted using the Chart command as described in the instructions. The students then chose the type of regression analysis they wished to use to form their Trendline. Using the data projector in class, I used linear regression to fit my data as an example model. I demonstrated how to enter the Trendline's formula into the spreadsheet to calculate the melting points of the model as well as the melting points of the predictive set.

A student's model is included in studentexample spreadsheet. The student chose to fit the data with logarithmic regression. I was pleased by the amount of creativity that went into making the charts that display the model's fit. This student chose a very colorful marble background for his chart.

After completing this project, students were asked to answer the following questions:

- We know a 15th degree polynomial exists that fits the given data perfectly. Why don't we use this as our model?
- What method of regression analysis did you choose (linear, polynomial, etc.)? Explain why you chose your method.
- Did one of your two models do a better job of predicting the five withheld melting points? Why do you think this was the case?
- What other situations could you model using graph theory?

After reviewing their responses, it was clear that they did not understand the first question. In fact, many stated that no such polynomial existed! I reviewed the technique for finding this polynomial in class. I also explained the concept of overfitting to the data; a 15th degree polynomial fit to their data would either tend to positive or negative infinity and would therefore be a very poor predictor for the melting points of larger normal alkanes. In fact, any polynomial function will tend to positive or negative infinity, making it impossible to predict reasonable melting points for larger alkanes.

Many accurate and insightful answers were given as to why one of the models did a better job at predicting the missing melting points. Students correctly observed that the model yielding more accurate predictions had a coefficient of determination (`R`^{2}) closer to 1. The students went on to examine the actual predictive sets randomly chosen in each model. One student conjectured that her second model did a better job predicting the missing data because the average uncertainty factor was slightly higher for the points withheld from the model.

Students were creative in providing examples that could be modeled using graph theory. Such examples included modeling resistance networks and chemical reactions, scheduling resources such as exams and operating rooms, and examples involving law enforcement and GPS tracking systems.

Although laptops were used, this modeling exercise could also be implemented using graphing calculators. Most graphing calculators have regression analysis capabilities that give a choice of a linear, quadratic, or logarithmic fits for the given data. Once the type of regression analysis is chosen, the graphing calculator outputs the equation and the coefficient of determination. The calculator can then store this function and graph it along with the given data points.

By constructing these models, students gained an appreciation for some of the research techniques I apply in my work. I used the graphical representation of chemical molecules to define indices for models of the boiling and melting points of alkanes. I showed the students several examples of indices and how they were computed from the graphical carbon tree representation of the alkane. The students were excited to see the relationship between graph theory and chemistry. Many of their upper level classes are completely self-contained so it was enlightening for the students to see a practical application of mathematics to chemistry.