Skip to main content

Week 4

Hyperparameter Tuning

Ensembling

Validation in presence of time component

f) KFold scheme in time series

In time-series task we usually have a fixed period of time we are asked to predict. Like day, week, month or arbitrary period with duration of T.

  1. Split the train data into chunks of duration T. Select first M chunks.
  2. Fit N diverse models on those M chunks and predict for the chunk M+1. Then fit those models on first M+1 chunks and predict for chunk M+2 and so on, until you hit the end. After that use all train data to fit models and get predictions for test. Now we will have meta-features for the chunks starting from number M+1 as well as meta-features for the test.
  3. Now we can use meta-features from first K chunks [M+1,M+2,..,M+K] to fit level 2 models and validate them on chunk M+K+1. Essentially we are back to step 1. with the lesser amount of chunks and meta-features instead of features.

g) KFold scheme in time series with limited amount of data

We may often encounter a situation, where scheme f) is not applicable, especially with limited amount of data. For example, when we have only years 2014, 2015, 2016 in train and we need to predict for a whole year 2017 in test. In such cases scheme c) could be of help, but with one constraint: KFold split should be done with the respect to the time component. For example, in case of data with several years we would treat each year as a fold.

Example Notebooks

Hyperparameters_tuning_video2_RF_n_estimators.ipynb

Macros.ipynb

compute_KNN_features.ipynb

Additional Material

Far0n's framework for Kaggle competitions "kaggletils"

28 Jupyter Notebook tips, tricks and shortcuts

Matrix Factorization:

t-SNE:

Interactions: