bdranalytics.pdlearn package

The bdranalytics.pdlearn module contains adapters that allows you to put pandas.DataFrame instances into sklearn without losing the column names. sklearn already allows you to provide instances of pandas.DataFrame, but as it internally works with numpy.array, column names are lost during transformation. Here we provide adapters, which re-add the column names after the sklearn modifications.

Submodules

bdranalytics.pdlearn.pipeline module

class bdranalytics.pdlearn.pipeline.PdFeatureChain(steps)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Passes a data set through a pipeline / chain of transformers. The output of the first transformer is fed into the next transformer.

Similar to sklearn Pipeline, but does not work with predictor in final step.

fit(X, y=None, **fit_params)[source]
fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

X : numpy array of shape [n_samples, n_features]
Training set.
y : numpy array of shape [n_samples]
Target values.
X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.
transform(X)[source]
class bdranalytics.pdlearn.pipeline.PdFeatureUnion(transformer_list, n_jobs=1, transformer_weights=None, debug=False)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Concatenates the result of multiple transformers

fit(X, y=None, **fit_params)[source]
transform(X)[source]
transformgen(X)[source]

bdranalytics.pdlearn.preprocessing module

class bdranalytics.pdlearn.preprocessing.DateCyclicalEncoding(date_columns, parts=['DAY', 'DAY_OF_WEEK', 'HOUR', 'MINUTE', 'MONTH', 'SECOND'], new_column_names=None, drop=True)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Feature-engineering class that transforms date columns into cyclical numerical columns. The original date column will be removed. To be used by sklearn pipelines

Parameters:
  • date_columns – the column names of the date columns to be expanded in one hot encodings
  • new_column_names – the names to use as prefix for the generated column names
  • drop – whether or not to drop the original column
  • parts – the parts to extract from the date columns, and to then transform into one-hot encodings
all_to_cyclical_parts(X)[source]
fit(X, y)[source]
transform(X)[source]
class bdranalytics.pdlearn.preprocessing.DateOneHotEncoding(date_columns, parts=['DAY', 'DAY_OF_WEEK', 'HOUR', 'MINUTE', 'MONTH', 'SECOND'], new_column_names=None, drop=True)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Feature-engineering class that transforms date columns into one hot encoding of the parts (day, hour, ..). The original date column will be removed. To be used by sklearn pipelines

Parameters:
  • date_columns – the column names of the date columns to be expanded in one hot encodings
  • new_column_names – the names to use as prefix for the generated column names
  • drop – whether or not to drop the original column
  • parts – the parts to extract from the date columns, and to then transform into one-hot encodings
all_to_parts(X)[source]
fit(X, y)[source]
transform(X)[source]
transform_one_hots(X)[source]
class bdranalytics.pdlearn.preprocessing.PdLagTransformer(lag)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

do_transform(dataframe)[source]
fit(X, y=None, **fit_params)[source]
fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

X : numpy array of shape [n_samples, n_features]
Training set.
y : numpy array of shape [n_samples]
Target values.
X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.
transform(X)[source]
class bdranalytics.pdlearn.preprocessing.PdWindowTransformer(func, **rolling_params)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

do_transform(dataframe)[source]
fit(X, y=None, **fit_params)[source]
fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

X : numpy array of shape [n_samples, n_features]
Training set.
y : numpy array of shape [n_samples]
Target values.
X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.
transform(X)[source]
bdranalytics.pdlearn.preprocessing.date_to_cyclical(df, col_name, parts=['MONTH', 'DAY', 'DAY_OF_WEEK', 'HOUR', 'MINUTE', 'SECOND'], new_col_name_prefix=None)[source]
bdranalytics.pdlearn.preprocessing.date_to_dateparts(df, col_name, parts=['MONTH', 'DAY', 'DAY_OF_WEEK', 'HOUR', 'MINUTE', 'SECOND'], new_col_name_prefix=None)[source]
bdranalytics.pdlearn.preprocessing.format_colname(prefix, suffix)[source]
bdranalytics.pdlearn.preprocessing.to_circular_variable(df, col_name, cardinality)[source]