pandas_ml.core package¶

Submodules¶

class pandas_ml.core.frame.ModelFrame(data, target=None, *args, **kwargs)¶

Bases: pandas_ml.core.generic.ModelPredictor, pandas.core.frame.DataFrame

Data structure subclassing pandas.DataFrame to define a metadata to specify target (response variable) and data (explanatory variable / features).

Parameters:	data : same as `pandas.DataFrame` target : str or array-like Column name or values to be used as target args : arguments passed to `pandas.DataFrame` kwargs : keyword arguments passed to `pandas.DataFrame`

calibration¶: Property to access sklearn.calibration

cls¶: alias of pandas_ml.skaccessors.gaussian_process.GaussianProcessMethods

cluster¶: Property to access sklearn.cluster. See pandas_ml.skaccessors.cluster

covariance¶: Property to access sklearn.covariance. See pandas_ml.skaccessors.covariance

cross_decomposition¶: Property to access sklearn.cross_decomposition

da¶: Property to access sklearn.discriminant_analysis

data¶

Return data (explanatory variable / features)

Returns:	data : `ModelFrame`

decision_function(estimator, *args, **kwargs)¶

Call estimator’s decision_function method.

Parameters:	args : arguments passed to decision_function method kwargs : keyword arguments passed to decision_function method
Returns:	returned : decisions

decomposition¶: Property to access sklearn.decomposition

discriminant_analysis¶: Property to access sklearn.discriminant_analysis

dummy¶: Property to access sklearn.dummy

ensemble¶: Property to access sklearn.ensemble. See pandas_ml.skaccessors.ensemble

feature_extraction¶: Property to access sklearn.feature_extraction. See pandas_ml.skaccessors.feature_extraction

feature_selection¶: Property to access sklearn.feature_selection. See pandas_ml.skaccessors.feature_selection

fit_predict(estimator, *args, **kwargs)¶

Call estimator’s fit_predict method.

Parameters:	args : arguments passed to fit_predict method kwargs : keyword arguments passed to fit_predict method
Returns:	returned : predicted result

fit_resample(estimator, *args, **kwargs)¶

Call estimator’s fit_resample method.

Parameters:	args : arguments passed to fit_resample method kwargs : keyword arguments passed to fit_resample method
Returns:	returned : resampling result

fit_sample(estimator, *args, **kwargs)¶

Call estimator’s fit_sample method.

Parameters:	args : arguments passed to fit_sample method kwargs : keyword arguments passed to fit_sample method
Returns:	returned : sampling result

fit_transform(estimator, *args, **kwargs)¶

Call estimator’s fit_transform method.

Parameters:	args : arguments passed to fit_transform method kwargs : keyword arguments passed to fit_transform method
Returns:	returned : transformed result

gaussian_process¶: Property to access sklearn.gaussian_process. See pandas_ml.skaccessors.gaussian_process

gp¶: Property to access sklearn.gaussian_process. See pandas_ml.skaccessors.gaussian_process

groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False)¶

Group DataFrame or Series using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters:

by : mapping, function, label, or list of labels: Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If an ndarray is passed, the values are used as-is determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted a (single) key.
axis : {0 or ‘index’, 1 or ‘columns’}, default 0: Split along rows (0) or columns (1).
level : int, level name, or sequence of such, default None: If the axis is a MultiIndex (hierarchical), group by a particular level or levels.
as_index : bool, default True: For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.
sort : bool, default True: Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
group_keys : bool, default True: When calling apply, add group keys to index to identify pieces.
squeeze : bool, default False: Reduce the dimensionality of the return type if possible, otherwise return a consistent type.
observed : bool, default False: This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

New in version 0.23.0.
**kwargs: Optional, only accepts keyword argument ‘mutated’ and is passed to groupby.

Returns:

DataFrameGroupBy or SeriesGroupBy: Depends on the calling object and returns groupby object that contains information about the groups.

See also

resample: Convenience method for frequency conversion and resampling of time series.

Notes

See the user guide for more.

Examples

>>> df = pd.DataFrame({'Animal' : ['Falcon', 'Falcon',
...                                'Parrot', 'Parrot'],
...                    'Max Speed' : [380., 370., 24., 26.]})
>>> df
   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0
>>> df.groupby(['Animal']).mean()
        Max Speed
Animal
Falcon      375.0
Parrot       25.0

Hierarchical Indexes

We can groupby different levels of a hierarchical index using the level parameter:

>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
...           ['Capitve', 'Wild', 'Capitve', 'Wild']]
>>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
>>> df = pd.DataFrame({'Max Speed' : [390., 350., 30., 20.]},
...                    index=index)
>>> df
                Max Speed
Animal Type
Falcon Capitve      390.0
       Wild         350.0
Parrot Capitve       30.0
       Wild          20.0
>>> df.groupby(level=0).mean()
        Max Speed
Animal
Falcon      370.0
Parrot       25.0
>>> df.groupby(level=1).mean()
         Max Speed
Type
Capitve      210.0
Wild         185.0

has_data()¶

Return whether ModelFrame has data

Returns:	has_data : bool

has_multi_targets()¶

Return whether ModelFrame has multiple target columns

Returns:	has_multi_targets : bool

has_target()¶

Return whether ModelFrame has target

Returns:	has_target : bool

imbalance¶: Property to access imblearn

inverse_transform(estimator, *args, **kwargs)¶

Call estimator’s inverse_transform method.

Parameters:	args : arguments passed to inverse_transform method kwargs : keyword arguments passed to inverse_transform method
Returns:	returned : transformed result

isotonic¶: Property to access sklearn.isotonic. See pandas_ml.skaccessors.isotonic

kernel_approximation¶: Property to access sklearn.kernel_approximation

kernel_ridge¶: Property to access sklearn.kernel_ridge

lda¶: Property to access sklearn.lda

linear_model¶: Property to access sklearn.linear_model. See pandas_ml.skaccessors.linear_model

lm¶: Property to access sklearn.linear_model. See pandas_ml.skaccessors.linear_model

manifold¶: Property to access sklearn.manifold. See pandas_ml.skaccessors.manifold

metrics¶: Property to access sklearn.metrics. See pandas_ml.skaccessors.metrics

mixture¶: Property to access sklearn.mixture

model_selection¶: Property to access sklearn.model_selection. See pandas_ml.skaccessors.model_selection

ms¶: Property to access sklearn.model_selection. See pandas_ml.skaccessors.model_selection

multiclass¶: Property to access sklearn.multiclass. See pandas_ml.skaccessors.multiclass

multioutput¶: Property to access sklearn.multioutput. See pandas_ml.skaccessors.multioutput

naive_bayes¶: Property to access sklearn.naive_bayes

neighbors¶: Property to access sklearn.neighbors. See pandas_ml.skaccessors.neighbors

neural_network¶: Property to access sklearn.neural_network

pipeline¶: Property to access sklearn.pipeline. See pandas_ml.skaccessors.pipeline

pp¶: Property to access sklearn.preprocessing. See pandas_ml.skaccessors.preprocessing

predict_log_proba(estimator, *args, **kwargs)¶

Call estimator’s predict_log_proba method.

Parameters:	args : arguments passed to predict_log_proba method kwargs : keyword arguments passed to predict_log_proba method
Returns:	returned : probabilities

predict_proba(estimator, *args, **kwargs)¶

Call estimator’s predict_proba method.

Parameters:	args : arguments passed to predict_proba method kwargs : keyword arguments passed to predict_proba method
Returns:	returned : probabilities

preprocessing¶: Property to access sklearn.preprocessing. See pandas_ml.skaccessors.preprocessing

qda¶: Property to access sklearn.qda

random_projection¶: Property to access sklearn.random_projection. See pandas_ml.skaccessors.random_projection

sample(estimator, *args, **kwargs)¶

Call estimator’s sample method.

Parameters:	args : arguments passed to sample method kwargs : keyword arguments passed to sample method
Returns:	returned : sampling result

score(estimator, *args, **kwargs)¶

Call estimator’s score method.

Parameters:	args : arguments passed to score method kwargs : keyword arguments passed to score method
Returns:	returned : score

seaborn¶: Property to access seaborn API

semi_supervised¶: Property to access sklearn.semi_supervised. See pandas_ml.skaccessors.semi_supervised

sns¶: Property to access seaborn API

svm¶: Property to access sklearn.svm. See pandas_ml.skaccessors.svm

target¶

Return target (response variable)

Returns:	target : `ModelSeries`

target_name¶

Return target column name

Returns:	target : object

transform(estimator, *args, **kwargs)¶

Call estimator’s transform method.

Parameters:	args : arguments passed to transform method kwargs : keyword arguments passed to transform method
Returns:	returned : transformed result

tree¶: Property to access sklearn.tree

xgb¶: Property to access xgboost.sklearn API

xgboost¶: Property to access xgboost.sklearn API

class pandas_ml.core.generic.ModelPredictor¶

Bases: pandas_ml.core.generic.ModelTransformer

Base class for ModelFrame and ModelFrameGroupBy

decision¶

Return current estimator’s decision function

Returns:	decisions : `ModelFrame`

estimator¶

Return most recently used estimator

Returns:	estimator : estimator

log_proba¶

Return current estimator’s log probabilities

Returns:	probabilities : `ModelFrame`

predict(estimator, *args, **kwargs)¶

Call estimator’s predict method.

Parameters:	args : arguments passed to predict method kwargs : keyword arguments passed to predict method
Returns:	returned : predicted result

predicted¶

Return current estimator’s predicted results

Returns:	predicted : `ModelSeries`

proba¶

Return current estimator’s probabilities

Returns:	probabilities : `ModelFrame`

class pandas_ml.core.generic.ModelTransformer¶

Bases: object

Base class for ModelFrame and ModelFrame

fit(estimator, *args, **kwargs)¶

Call estimator’s fit method.

Parameters:	args : arguments passed to fit method kwargs : keyword arguments passed to fit method
Returns:	returned : None or fitted estimator

fit_transform(estimator, *args, **kwargs)¶

Call estimator’s fit_transform method.

Parameters:	args : arguments passed to fit_transform method kwargs : keyword arguments passed to fit_transform method
Returns:	returned : transformed result

inverse_transform(estimator, *args, **kwargs)¶

Call estimator’s inverse_transform method.

Parameters:	args : arguments passed to inverse_transform method kwargs : keyword arguments passed to inverse_transform method
Returns:	returned : transformed result

transform(estimator, *args, **kwargs)¶

Call estimator’s transform method.

Parameters:	args : arguments passed to transform method kwargs : keyword arguments passed to transform method
Returns:	returned : transformed result

class pandas_ml.core.groupby.GroupedEstimator(estimator, grouped)¶

Bases: pandas_ml.core.base._BaseEstimator

Create grouped estimators based on passed estimator

class pandas_ml.core.groupby.ModelFrameGroupBy(obj, keys=None, axis=0, level=None, grouper=None, exclusions=None, selection=None, as_index=True, sort=True, group_keys=True, squeeze=False, observed=False, **kwargs)¶

Bases: pandas.core.groupby.generic.DataFrameGroupBy, pandas_ml.core.generic.ModelPredictor

transform(func, *args, **kwargs)¶

Call estimator’s transform method.

Parameters:	args : arguments passed to transform method kwargs : keyword arguments passed to transform method
Returns:	returned : transformed result

class pandas_ml.core.groupby.ModelSeriesGroupBy(obj, keys=None, axis=0, level=None, grouper=None, exclusions=None, selection=None, as_index=True, sort=True, group_keys=True, squeeze=False, observed=False, **kwargs)¶: Bases: pandas.core.groupby.generic.SeriesGroupBy

pandas_ml.core.groupby.groupby(obj, by, **kwds)¶

Class for grouping and aggregating relational data.

See aggregate, transform, and apply functions on this object.

It’s easiest to use obj.groupby(…) to use GroupBy, but you can also do:

grouped = groupby(obj, ...)

Parameters:	obj : pandas object axis : int, default 0 level : int, default None Level of MultiIndex groupings : list of Grouping objects Most users should ignore this exclusions : array-like, optional List of columns to exclude name : string Most users should ignore this
Returns:	Attributes groups : dict {group name -> group labels} len(grouped) : int Number of groups

Notes

After grouping, see aggregate, apply, and transform functions. Here are some other brief notes about usage. When grouping by multiple groups, the result index will be a MultiIndex (hierarchical) by default.

Iteration produces (key, group) tuples, i.e. chunking the data by group. So you can write code like:

grouped = obj.groupby(keys, axis=axis)
for key, group in grouped:
    # do something with the data

Function calls on GroupBy, if not specially implemented, “dispatch” to the grouped data. So if you group a DataFrame and wish to invoke the std() method on each group, you can simply do:

df.groupby(mapper).std()

rather than

df.groupby(mapper).aggregate(np.std)

You can pass arguments to these “wrapped” functions, too.

See the online documentation for full exposition on these topics and much more

class pandas_ml.core.series.ModelSeries(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)¶

Bases: pandas_ml.core.generic.ModelTransformer, pandas.core.series.Series

Wrapper for pandas.Series to support sklearn.preprocessing

groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False)¶

Group DataFrame or Series using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters:

by : mapping, function, label, or list of labels: Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If an ndarray is passed, the values are used as-is determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted a (single) key.
axis : {0 or ‘index’, 1 or ‘columns’}, default 0: Split along rows (0) or columns (1).
level : int, level name, or sequence of such, default None: If the axis is a MultiIndex (hierarchical), group by a particular level or levels.
as_index : bool, default True: For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.
sort : bool, default True: Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
group_keys : bool, default True: When calling apply, add group keys to index to identify pieces.
squeeze : bool, default False: Reduce the dimensionality of the return type if possible, otherwise return a consistent type.
observed : bool, default False: This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

New in version 0.23.0.
**kwargs: Optional, only accepts keyword argument ‘mutated’ and is passed to groupby.

Returns:

DataFrameGroupBy or SeriesGroupBy: Depends on the calling object and returns groupby object that contains information about the groups.

pandas_ml.core package¶

Submodules¶

Module contents¶