bdranalytics.sklearn package

Submodules

bdranalytics.sklearn.encoders module

class bdranalytics.sklearn.encoders.ColumnSelector(columns)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(X, y=None)[source]
transform(X)[source]
class bdranalytics.sklearn.encoders.LeaveOneOutEncoder(with_stdevs=True)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Leave one out transformation for high-capacity categorical variables.

fit(X, y=None)[source]
fit_transform(X, y)[source]

will be used during pipeline fit

transform(X)[source]
class bdranalytics.sklearn.encoders.StringIndexer[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(X, y=None)[source]
transform(X)[source]
class bdranalytics.sklearn.encoders.WeightOfEvidenceEncoder(verbose=0, cols=None, return_df=True, smooth=0.5, fillna=0, dependent_variable_values=None)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Feature-engineering class that transforms a high-capacity categorical value into Weigh of Evidence scores. Can be used in sklearn pipelines.

Parameters:smooth – value for additive smoothing, to prevent divide by zero
fit(X, y)[source]
transform(X, y=None)[source]

bdranalytics.sklearn.model_selection module

class bdranalytics.sklearn.model_selection.GrowingWindow(n_folds=3)[source]

Bases: abc.NewBase

Growing Window cross validator

Provides train/test indices to split data in train/test sets. Divides the data in n_folds+1 slices. For split i [1..n_folds], slices [0..i} are train, slice i is test

Parameters:
n_folds : int, default=3
Number of folds. Must be at least 1
get_n_splits(X, y=None, labels=None)[source]

Returns the number of splitting iterations in the cross-validator Parameters ———- X : array-like, shape (n_samples, n_features)

Training data, where n_samples is the number of samples and n_features is the number of features.
y : object
Always ignored, exists for compatibility.
labels : object
Always ignored, exists for compatibility.
n_splits : int
Returns the number of splitting iterations in the cross-validator.
split(X, y=None, labels=None)[source]

Generate indices to split data into training and test set. Parameters ———- X : array-like, shape (n_samples, n_features)

Training data, where n_samples is the number of samples and n_features is the number of features.
y : array-like, of length n_samples
The target variable for supervised learning problems. ignored
labels : array-like, with shape (n_samples,), optional
Group labels for the samples used while splitting the dataset into train/test set. ignored
train : ndarray
The training set indices for that split.
test : ndarray
The testing set indices for that split.
class bdranalytics.sklearn.model_selection.IntervalGrowingWindow(test_start_date, timestamps='index', test_end_date=None, test_size=None, train_size=None)[source]

Bases: abc.NewBase

Growing Window cross-validator based on time intervals

generate_intervals(timestamps)[source]
get_n_splits(X, y=None, labels=None)[source]
get_timeseries(X)[source]

Returns the numpy array of timestamps for the given dataset

split(X, y=None, labels=None)[source]

Generate indices to split data into training and test sets based on time stamps