bdranalytics.sklearn package¶
Subpackages¶
Submodules¶
bdranalytics.sklearn.encoders module¶
-
class
bdranalytics.sklearn.encoders.
ColumnSelector
(columns)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
-
class
bdranalytics.sklearn.encoders.
LeaveOneOutEncoder
(with_stdevs=True)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Leave one out transformation for high-capacity categorical variables.
-
class
bdranalytics.sklearn.encoders.
StringIndexer
[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
-
class
bdranalytics.sklearn.encoders.
WeightOfEvidenceEncoder
(verbose=0, cols=None, return_df=True, smooth=0.5, fillna=0, dependent_variable_values=None)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Feature-engineering class that transforms a high-capacity categorical value into Weigh of Evidence scores. Can be used in sklearn pipelines.
Parameters: smooth – value for additive smoothing, to prevent divide by zero
bdranalytics.sklearn.model_selection module¶
-
class
bdranalytics.sklearn.model_selection.
GrowingWindow
(n_folds=3)[source]¶ Bases:
abc.NewBase
Growing Window cross validator
Provides train/test indices to split data in train/test sets. Divides the data in n_folds+1 slices. For split i [1..n_folds], slices [0..i} are train, slice i is test
- Parameters:
- n_folds : int, default=3
- Number of folds. Must be at least 1
-
get_n_splits
(X, y=None, labels=None)[source]¶ Returns the number of splitting iterations in the cross-validator Parameters ———- X : array-like, shape (n_samples, n_features)
Training data, where n_samples is the number of samples and n_features is the number of features.- y : object
- Always ignored, exists for compatibility.
- labels : object
- Always ignored, exists for compatibility.
- n_splits : int
- Returns the number of splitting iterations in the cross-validator.
-
split
(X, y=None, labels=None)[source]¶ Generate indices to split data into training and test set. Parameters ———- X : array-like, shape (n_samples, n_features)
Training data, where n_samples is the number of samples and n_features is the number of features.- y : array-like, of length n_samples
- The target variable for supervised learning problems. ignored
- labels : array-like, with shape (n_samples,), optional
- Group labels for the samples used while splitting the dataset into train/test set. ignored
- train : ndarray
- The training set indices for that split.
- test : ndarray
- The testing set indices for that split.