bdranalytics.pdlearn package¶
The bdranalytics.pdlearn module contains adapters that allows you
to put pandas.DataFrame instances into sklearn without
losing the column names.
sklearn already allows you to provide instances of pandas.DataFrame,
but as it internally works with numpy.array, column names are lost during transformation.
Here we provide adapters, which re-add the column names after the sklearn modifications.
Submodules¶
bdranalytics.pdlearn.pipeline module¶
-
class
bdranalytics.pdlearn.pipeline.PdFeatureChain(steps)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinPasses a data set through a pipeline / chain of transformers. The output of the first transformer is fed into the next transformer.
Similar to sklearn Pipeline, but does not work with predictor in final step.
-
fit_transform(X, y=None, **fit_params)[source]¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- X : numpy array of shape [n_samples, n_features]
- Training set.
- y : numpy array of shape [n_samples]
- Target values.
- X_new : numpy array of shape [n_samples, n_features_new]
- Transformed array.
-
bdranalytics.pdlearn.preprocessing module¶
-
class
bdranalytics.pdlearn.preprocessing.DateCyclicalEncoding(date_columns, parts=['DAY', 'DAY_OF_WEEK', 'HOUR', 'MINUTE', 'MONTH', 'SECOND'], new_column_names=None, drop=True)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinFeature-engineering class that transforms date columns into cyclical numerical columns. The original date column will be removed. To be used by sklearn pipelines
Parameters: - date_columns – the column names of the date columns to be expanded in one hot encodings
- new_column_names – the names to use as prefix for the generated column names
- drop – whether or not to drop the original column
- parts – the parts to extract from the date columns, and to then transform into one-hot encodings
-
class
bdranalytics.pdlearn.preprocessing.DateOneHotEncoding(date_columns, parts=['DAY', 'DAY_OF_WEEK', 'HOUR', 'MINUTE', 'MONTH', 'SECOND'], new_column_names=None, drop=True)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinFeature-engineering class that transforms date columns into one hot encoding of the parts (day, hour, ..). The original date column will be removed. To be used by sklearn pipelines
Parameters: - date_columns – the column names of the date columns to be expanded in one hot encodings
- new_column_names – the names to use as prefix for the generated column names
- drop – whether or not to drop the original column
- parts – the parts to extract from the date columns, and to then transform into one-hot encodings
-
class
bdranalytics.pdlearn.preprocessing.PdLagTransformer(lag)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixin-
fit_transform(X, y=None, **fit_params)[source]¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- X : numpy array of shape [n_samples, n_features]
- Training set.
- y : numpy array of shape [n_samples]
- Target values.
- X_new : numpy array of shape [n_samples, n_features_new]
- Transformed array.
-
-
class
bdranalytics.pdlearn.preprocessing.PdWindowTransformer(func, **rolling_params)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixin-
fit_transform(X, y=None, **fit_params)[source]¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- X : numpy array of shape [n_samples, n_features]
- Training set.
- y : numpy array of shape [n_samples]
- Target values.
- X_new : numpy array of shape [n_samples, n_features_new]
- Transformed array.
-
-
bdranalytics.pdlearn.preprocessing.date_to_cyclical(df, col_name, parts=['MONTH', 'DAY', 'DAY_OF_WEEK', 'HOUR', 'MINUTE', 'SECOND'], new_col_name_prefix=None)[source]¶