Statistical package utilizing the general linear model for R
The purpose of this thesis is to formulate a statistical package for the program R. This package will utilize matrices for ease of calculation to make use of the general linear model to provide meaningful data. With this statistical package, I hope to create a refined, simple, and powerful tool to aid mathematicians, statisticians, and researchers alike. In order to create this statistical package, I will be using both R and R-Studio. Both of these programs utilize the same programing language and will be able to make full use of this package. The package will be able to read-in both Excel and CSV spreadsheets. The output of this package will be presented in a clear and concise manner. The general linear model that will be used in this thesis can be represented with the following equation, as it is found in, “The General Linear Model: A "New" Trend in Analysis of Variance,” by Maurice Tatsuoka.
The matrices of the equation are as follows: Y matrix is the observations matrix, X is the design matrix, Θ is parameter matrix, and ε is the error matrix. (Tatsuoka, 1975) With these matrices, meaningful statistical data will be calculated including but not necessarily limited to: parameter estimation, statistical significance, and hypothesis testing. To calculate these values, the author will start by invoking the principle of least squares, which is given by the following equation. Q=ε'ε=(Y-XΘ)'(Y-XΘ) To re-parameterize the general linear model, and invoke the principle of least squares, the author will utilize the following matrices to estimate the parameters of the general linear model.
With the parameters and error estimated for a given dataset, the author will then proceed with a few intermediate calculations to arrive at the variance-covariance matrix, V(Θ*), in the following manner.
Using these equations, the author will work through an example in the text to compare two groups. The aforementioned equations are the same equations that will be programmed into the R package. The following two matrices will be partially used to calculate the t-statistic, or an analogous value.