bdranalytics.sklearn package¶

Subpackages¶

bdranalytics.sklearn.preprocessing package

Submodules¶

bdranalytics.sklearn.encoders module¶

class bdranalytics.sklearn.encoders.ColumnSelector(columns)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(X, y=None)[source]¶

transform(X)[source]¶

class bdranalytics.sklearn.encoders.LeaveOneOutEncoder(with_stdevs=True)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Leave one out transformation for high-capacity categorical variables.

fit(X, y=None)[source]¶

fit_transform(X, y)[source]¶: will be used during pipeline fit

transform(X)[source]¶

class bdranalytics.sklearn.encoders.StringIndexer[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(X, y=None)[source]¶

transform(X)[source]¶

class bdranalytics.sklearn.encoders.WeightOfEvidenceEncoder(verbose=0, cols=None, return_df=True, smooth=0.5, fillna=0, dependent_variable_values=None)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Feature-engineering class that transforms a high-capacity categorical value into Weigh of Evidence scores. Can be used in sklearn pipelines.

Parameters:	smooth – value for additive smoothing, to prevent divide by zero

fit(X, y)[source]¶

transform(X, y=None)[source]¶

bdranalytics.sklearn.model_selection module¶

class bdranalytics.sklearn.model_selection.GrowingWindow(n_folds=3)[source]¶

Bases: abc.NewBase

Growing Window cross validator

Provides train/test indices to split data in train/test sets. Divides the data in n_folds+1 slices. For split i [1..n_folds], slices [0..i} are train, slice i is test

Parameters:

n_folds : int, default=3: Number of folds. Must be at least 1

get_n_splits(X, y=None, labels=None)[source]¶

Returns the number of splitting iterations in the cross-validator Parameters ———- X : array-like, shape (n_samples, n_features)

Training data, where n_samples is the number of samples and n_features is the number of features.

y : object: Always ignored, exists for compatibility.
labels : object: Always ignored, exists for compatibility.

n_splits : int: Returns the number of splitting iterations in the cross-validator.

split(X, y=None, labels=None)[source]¶

Generate indices to split data into training and test set. Parameters ———- X : array-like, shape (n_samples, n_features)

Training data, where n_samples is the number of samples and n_features is the number of features.

y : array-like, of length n_samples: The target variable for supervised learning problems. ignored
labels : array-like, with shape (n_samples,), optional: Group labels for the samples used while splitting the dataset into train/test set. ignored

train : ndarray: The training set indices for that split.
test : ndarray: The testing set indices for that split.

class bdranalytics.sklearn.model_selection.IntervalGrowingWindow(test_start_date, timestamps='index', test_end_date=None, test_size=None, train_size=None)[source]¶

Bases: abc.NewBase

Growing Window cross-validator based on time intervals

generate_intervals(timestamps)[source]¶

get_n_splits(X, y=None, labels=None)[source]¶

get_timeseries(X)[source]¶: Returns the numpy array of timestamps for the given dataset

split(X, y=None, labels=None)[source]¶: Generate indices to split data into training and test sets based on time stamps