MY MEMO

[MACHINE LEARNING] Feature Scaling and Normal Equation 본문

MACHINE LEARNING/Stanford University

[MACHINE LEARNING] Feature Scaling and Normal Equation

l_j_yeon 2017. 3. 29. 20:57

+) this post is based on the lecture and content in the coursera(https://www.coursera.org/) machine learning class 

(professor)

+) you can make new algorithm using original gradient descent function
   so you can find this functions are same

Feature Scaling

We can speed up gradient descent by having each of our input values in roughly the same range.

−1 ≤ x(i) ≤ 1

or

−0.5 ≤ x(i) ≤ 0.5

These aren't exact requirements; we are only trying to speed things up. The goal is to get all input variables into roughly one of these ranges, give or take a few.

Two techniques to help with this are feature scaling and mean normalization. Feature scaling involves dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1. 

formula:

Where μi is the average of all the values for feature (i) and si is the range of values (max - min), or si is the standard deviation.

i:=price10001900.




Learning Rate

To summarize:

If α is too small: slow convergence.

If α is too large: may not decrease on every iteration and thus may not converge.



Features and Polynomial Regression


2Polynomial Regression

Our hypothesis function need not be linear (a straight line) if that does not fit the data well.

We can change the behavior or curve of our hypothesis function by making it a quadratic, cubic or square root function (or any other form).

eg. if x1 has range 1 - 1000 then range of x21 becomes 1 - 1000000 and that of x31 becomes 1 - 1000000000



Normal Equation


There is no need to do feature scaling with the normal equation.

The following is a comparison of gradient descent and the normal equation:

Gradient DescentNormal Equation
Need to choose alphaNo need to choose alpha
Needs many iterationsNo need to iterate
O (nn2)

O (n^3n3), need to calculate inverse of XTXXTXXTX

Works well when n is largeSlow if n is very large


if XTXnoninvertible, the common causes might be having :
  • Redundant features, where two features are very closely related (i.e. they are linearly dependent)
  • Too many features (e.g. m ≤ n). In this case, delete some features or use "regularization" (to be explained in a later lesson).

but 'pinv' function will give you a value.


Comments