If you ask yourself what’s the most important thing in machine learning, what’s your answer? All data scientist would have different answers.
Among the other many answers, I believe feature engineering is one of the most important in machine learning. Sometimes, it’s a more critical step than a model selection and training a model because a model cannot improve a model itself even though we put a lot of effort on a hyper parameter turning. However, well selected/extracted features could be applied to many different models and improve a performance.
A feature engineering includes feature selection and feature extraction. A feature selection is trial-error process to select relevant features from existing features. Since all features are simply selected from original features it’s easy to interpret what those features means. However, it is difficult to consider a relationship in selected features.
On the other hands, a feature extraction is more like functional process to extract relevant features from existing features. It requires a form of function which enable an algorithm to create/extract a new set of features. A relationship between features will be considered and number of features could be significantly reduced. Yet, an interpretation of extracted features is not easy.
We should use different methods of feature engineering depending on a machine learning algorithm we want to use.
In supervised learning, we could select features form Information gain
, Stepwise regression
, LASSO
, Genetic algorithms
, etc. If we want to extract features, Partial Least Squares (PLS)
is an option.
In unsupervised learning, we could do a feature selection with PCA loading
; a feature extraction uses Principal component analysis (PCA)
, Wavelets transforms
, Autoencoder
, etc.