Data mining EPA's Green Vehicle Guide: Profiling and prediction using k-means clustering and neural networks
This thesis is designed to study data mining techniques and explore the predictive value of data from the EPA's Green Vehicle Guide which supplies pertinent information regarding environmental performance for each vehicle sold in the United States from 2000 to 2010. Using IBM® SPSS® Modeler to discover patterns most advantageous to statistical analysis of the data set, each vehicle's various variables and scores in relation to emission and air quality and Smart Way status are modeled using two techniques, k-means clustering and the artificial neural networks. Predictions based on analysis of this data set are as expected with all models claiming greenhouse gas scores to be the greatest predictor variable for Smart Way status. Therefore, engineers' and companies' focus on better technology to improve greenhouse gas scores will be essential if Smart Way status and environmental consciousness is a goal.