Linear Models of Process-Structure Behavior

Posted by Almambet Iskakov on November 1, 2015

Cross-validation & Optimization

This post heavily utilizes cross-validation and parameter optimization. We have implemented a pipeline that performs both of these using Scikit-Learn, pyMKS and NumPy.

Our cross-validation module allows for different types of cross-validators.

Linear Models

We are working on our analysis of the different linear models and present the results below.

Linear Regression

Model form:

More concisely:

This is a the ordinary least squares approach discussed in class.

Polynomial Fits

We used polynomial interpolation to model different potential representations for our data. We performed this by creating a new set of features consisting of the polynomial combinations of our features. We then performed OLS (ordinary least squares) linear regression on the new feature space to find the best fit in the new space. We tested only modest degrees as high degree polynomials are likely to heavily over-fit the our data.

Ridge Regression

Ridge model can be a better suited to make predictions. This model also has less variance than OLS. In OLS, we are looking for parameter, which we find by minimizing least squares of . However, in ridge model, a parameter is used to bias to now minimize . By introducing the term, the values of parameter are now dependent on value of . If is increased, is forced to decrease. This is forcing large parameters to become smaller and might make the model avoid overfitting.

A closed form solution for parameters is the following: if alpha is zero, then the model becomes an OLS.

Lasso Regression

This type of model is similar to Ridge in that there is a tuning parameter . However, now the goal is to minimize . The primary difference is that some of the parameters in Lasso model can become zero if they are not relevant in the model, whereas in Ridge all the parameters are kept non-zero.

Performance

r2 mse Comparison of different linear models for preliminary results for linkage between PC space and solidification velocity (5-fold CV). Results show that Linear regression and Ridge models are better performers than Lasso.