bdranalytics.pdlearn package¶
The bdranalytics.pdlearn
module contains adapters that allows you
to put pandas.DataFrame
instances into sklearn
without
losing the column names.
sklearn
already allows you to provide instances of pandas.DataFrame
,
but as it internally works with numpy.array
, column names are lost during transformation.
Here we provide adapters, which re-add the column names after the sklearn
modifications.
Submodules¶
bdranalytics.pdlearn.pipeline module¶
-
class
bdranalytics.pdlearn.pipeline.
PdFeatureChain
(steps)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Passes a data set through a pipeline / chain of transformers. The output of the first transformer is fed into the next transformer.
Similar to sklearn Pipeline, but does not work with predictor in final step.
-
fit_transform
(X, y=None, **fit_params)[source]¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- X : numpy array of shape [n_samples, n_features]
- Training set.
- y : numpy array of shape [n_samples]
- Target values.
- X_new : numpy array of shape [n_samples, n_features_new]
- Transformed array.
-
bdranalytics.pdlearn.preprocessing module¶
-
class
bdranalytics.pdlearn.preprocessing.
DateCyclicalEncoding
(date_columns, parts=['DAY', 'DAY_OF_WEEK', 'HOUR', 'MINUTE', 'MONTH', 'SECOND'], new_column_names=None, drop=True)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Feature-engineering class that transforms date columns into cyclical numerical columns. The original date column will be removed. To be used by sklearn pipelines
Parameters: - date_columns – the column names of the date columns to be expanded in one hot encodings
- new_column_names – the names to use as prefix for the generated column names
- drop – whether or not to drop the original column
- parts – the parts to extract from the date columns, and to then transform into one-hot encodings
-
class
bdranalytics.pdlearn.preprocessing.
DateOneHotEncoding
(date_columns, parts=['DAY', 'DAY_OF_WEEK', 'HOUR', 'MINUTE', 'MONTH', 'SECOND'], new_column_names=None, drop=True)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Feature-engineering class that transforms date columns into one hot encoding of the parts (day, hour, ..). The original date column will be removed. To be used by sklearn pipelines
Parameters: - date_columns – the column names of the date columns to be expanded in one hot encodings
- new_column_names – the names to use as prefix for the generated column names
- drop – whether or not to drop the original column
- parts – the parts to extract from the date columns, and to then transform into one-hot encodings
-
class
bdranalytics.pdlearn.preprocessing.
PdLagTransformer
(lag)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
-
fit_transform
(X, y=None, **fit_params)[source]¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- X : numpy array of shape [n_samples, n_features]
- Training set.
- y : numpy array of shape [n_samples]
- Target values.
- X_new : numpy array of shape [n_samples, n_features_new]
- Transformed array.
-
-
class
bdranalytics.pdlearn.preprocessing.
PdWindowTransformer
(func, **rolling_params)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
-
fit_transform
(X, y=None, **fit_params)[source]¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- X : numpy array of shape [n_samples, n_features]
- Training set.
- y : numpy array of shape [n_samples]
- Target values.
- X_new : numpy array of shape [n_samples, n_features_new]
- Transformed array.
-
-
bdranalytics.pdlearn.preprocessing.
date_to_cyclical
(df, col_name, parts=['MONTH', 'DAY', 'DAY_OF_WEEK', 'HOUR', 'MINUTE', 'SECOND'], new_col_name_prefix=None)[source]¶