bdranalytics.sklearn.preprocessing package

class bdranalytics.sklearn.preprocessing.ScaledRegressor(scaler, estimator)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.RegressorMixin

Allows a regressor to work with a scaled target if it does not allow scaling itself.

When fitting, the y will be transform using the scaler, before being passed to the model.fit. When predicting, the predicted y will be inverse transformed to obtain a y_hat in the original range of values.

For example, this allows your regressor to predict manipulated targets (ie log(y)), without additional pre and postprocessing outside your sklearn pipeline

scaler : TransformerMixin
The transformer which will be applied on the target before it is passed to the model
estimator : RegressorMixin
The regressor which will work in transformed target space

Examples >>> from sklearn.linear_model import LinearRegression >>> from sklearn.preprocessing import StandardScaler >>> from sklearn.pipeline import Pipeline >>> n_rows = 10 >>> X = np.random.rand(n_rows, 2) >>> y = np.random.rand(n_rows) >>> regressor = LinearRegression() >>> scaler = StandardScaler() >>> pipeline = Pipeline([(“predict”, ScaledRegressor(scaler, regressor))]) >>> y_hat = pipeline.fit(X, y).predict(X)

fit(X, y)[source]
predict(X)[source]
class bdranalytics.sklearn.preprocessing.WeightOfEvidenceEncoder(verbose=0, cols=None, return_df=True, smooth=0.5, fillna=0, dependent_variable_values=None)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Feature-engineering class that transforms a high-capacity categorical value into Weigh of Evidence scores. Can be used in sklearn pipelines.

Parameters:smooth – value for additive smoothing, to prevent divide by zero
fit(X, y)[source]
transform(X, y=None)[source]
class bdranalytics.sklearn.preprocessing.StringIndexer[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(X, y=None)[source]
transform(X)[source]
class bdranalytics.sklearn.preprocessing.LeaveOneOutEncoder(with_stdevs=True)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(X, y=None)[source]
fit_transform(X, y)[source]

will be used during pipeline fit

transform(X)[source]

Submodules

bdranalytics.sklearn.preprocessing.encoding module

class bdranalytics.sklearn.preprocessing.encoding.LeaveOneOutEncoder(with_stdevs=True)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(X, y=None)[source]
fit_transform(X, y)[source]

will be used during pipeline fit

transform(X)[source]
class bdranalytics.sklearn.preprocessing.encoding.StringIndexer[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(X, y=None)[source]
transform(X)[source]
class bdranalytics.sklearn.preprocessing.encoding.WeightOfEvidenceEncoder(verbose=0, cols=None, return_df=True, smooth=0.5, fillna=0, dependent_variable_values=None)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Feature-engineering class that transforms a high-capacity categorical value into Weigh of Evidence scores. Can be used in sklearn pipelines.

Parameters:smooth – value for additive smoothing, to prevent divide by zero
fit(X, y)[source]
transform(X, y=None)[source]

bdranalytics.sklearn.preprocessing.preprocessing module

class bdranalytics.sklearn.preprocessing.preprocessing.ColumnSelector(columns)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(X, y=None)[source]
transform(X)[source]

bdranalytics.sklearn.preprocessing.scaling module

class bdranalytics.sklearn.preprocessing.scaling.ScaledRegressor(scaler, estimator)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.RegressorMixin

Allows a regressor to work with a scaled target if it does not allow scaling itself.

When fitting, the y will be transform using the scaler, before being passed to the model.fit. When predicting, the predicted y will be inverse transformed to obtain a y_hat in the original range of values.

For example, this allows your regressor to predict manipulated targets (ie log(y)), without additional pre and postprocessing outside your sklearn pipeline

scaler : TransformerMixin
The transformer which will be applied on the target before it is passed to the model
estimator : RegressorMixin
The regressor which will work in transformed target space

Examples >>> from sklearn.linear_model import LinearRegression >>> from sklearn.preprocessing import StandardScaler >>> from sklearn.pipeline import Pipeline >>> n_rows = 10 >>> X = np.random.rand(n_rows, 2) >>> y = np.random.rand(n_rows) >>> regressor = LinearRegression() >>> scaler = StandardScaler() >>> pipeline = Pipeline([(“predict”, ScaledRegressor(scaler, regressor))]) >>> y_hat = pipeline.fit(X, y).predict(X)

fit(X, y)[source]
predict(X)[source]