pandas_ml.core package

Submodules

class pandas_ml.core.frame.ModelFrame(data, target=None, *args, **kwargs)

Bases: pandas_ml.core.generic.ModelPredictor, pandas.core.frame.DataFrame

Data structure subclassing pandas.DataFrame to define a metadata to specify target (response variable) and data (explanatory variable / features).

Parameters:
data : same as pandas.DataFrame
target : str or array-like

Column name or values to be used as target

args : arguments passed to pandas.DataFrame
kwargs : keyword arguments passed to pandas.DataFrame
calibration

Property to access sklearn.calibration

cls

alias of pandas_ml.skaccessors.gaussian_process.GaussianProcessMethods

cluster

Property to access sklearn.cluster. See pandas_ml.skaccessors.cluster

covariance

Property to access sklearn.covariance. See pandas_ml.skaccessors.covariance

cross_decomposition

Property to access sklearn.cross_decomposition

da

Property to access sklearn.discriminant_analysis

data

Return data (explanatory variable / features)

Returns:
data : ModelFrame
decision_function(estimator, *args, **kwargs)

Call estimator’s decision_function method.

Parameters:
args : arguments passed to decision_function method
kwargs : keyword arguments passed to decision_function method
Returns:
returned : decisions
decomposition

Property to access sklearn.decomposition

discriminant_analysis

Property to access sklearn.discriminant_analysis

dummy

Property to access sklearn.dummy

ensemble

Property to access sklearn.ensemble. See pandas_ml.skaccessors.ensemble

feature_extraction

Property to access sklearn.feature_extraction. See pandas_ml.skaccessors.feature_extraction

feature_selection

Property to access sklearn.feature_selection. See pandas_ml.skaccessors.feature_selection

fit_predict(estimator, *args, **kwargs)

Call estimator’s fit_predict method.

Parameters:
args : arguments passed to fit_predict method
kwargs : keyword arguments passed to fit_predict method
Returns:
returned : predicted result
fit_resample(estimator, *args, **kwargs)

Call estimator’s fit_resample method.

Parameters:
args : arguments passed to fit_resample method
kwargs : keyword arguments passed to fit_resample method
Returns:
returned : resampling result
fit_sample(estimator, *args, **kwargs)

Call estimator’s fit_sample method.

Parameters:
args : arguments passed to fit_sample method
kwargs : keyword arguments passed to fit_sample method
Returns:
returned : sampling result
fit_transform(estimator, *args, **kwargs)

Call estimator’s fit_transform method.

Parameters:
args : arguments passed to fit_transform method
kwargs : keyword arguments passed to fit_transform method
Returns:
returned : transformed result
gaussian_process

Property to access sklearn.gaussian_process. See pandas_ml.skaccessors.gaussian_process

gp

Property to access sklearn.gaussian_process. See pandas_ml.skaccessors.gaussian_process

groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False)

Group DataFrame or Series using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters:
by : mapping, function, label, or list of labels

Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If an ndarray is passed, the values are used as-is determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted a (single) key.

axis : {0 or ‘index’, 1 or ‘columns’}, default 0

Split along rows (0) or columns (1).

level : int, level name, or sequence of such, default None

If the axis is a MultiIndex (hierarchical), group by a particular level or levels.

as_index : bool, default True

For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.

sort : bool, default True

Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.

group_keys : bool, default True

When calling apply, add group keys to index to identify pieces.

squeeze : bool, default False

Reduce the dimensionality of the return type if possible, otherwise return a consistent type.

observed : bool, default False

This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

New in version 0.23.0.

**kwargs

Optional, only accepts keyword argument ‘mutated’ and is passed to groupby.

Returns:
DataFrameGroupBy or SeriesGroupBy

Depends on the calling object and returns groupby object that contains information about the groups.

See also

resample
Convenience method for frequency conversion and resampling of time series.

Notes

See the user guide for more.

Examples

>>> df = pd.DataFrame({'Animal' : ['Falcon', 'Falcon',
...                                'Parrot', 'Parrot'],
...                    'Max Speed' : [380., 370., 24., 26.]})
>>> df
   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0
>>> df.groupby(['Animal']).mean()
        Max Speed
Animal
Falcon      375.0
Parrot       25.0

Hierarchical Indexes

We can groupby different levels of a hierarchical index using the level parameter:

>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
...           ['Capitve', 'Wild', 'Capitve', 'Wild']]
>>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
>>> df = pd.DataFrame({'Max Speed' : [390., 350., 30., 20.]},
...                    index=index)
>>> df
                Max Speed
Animal Type
Falcon Capitve      390.0
       Wild         350.0
Parrot Capitve       30.0
       Wild          20.0
>>> df.groupby(level=0).mean()
        Max Speed
Animal
Falcon      370.0
Parrot       25.0
>>> df.groupby(level=1).mean()
         Max Speed
Type
Capitve      210.0
Wild         185.0
has_data()

Return whether ModelFrame has data

Returns:
has_data : bool
has_multi_targets()

Return whether ModelFrame has multiple target columns

Returns:
has_multi_targets : bool
has_target()

Return whether ModelFrame has target

Returns:
has_target : bool
imbalance

Property to access imblearn

inverse_transform(estimator, *args, **kwargs)

Call estimator’s inverse_transform method.

Parameters:
args : arguments passed to inverse_transform method
kwargs : keyword arguments passed to inverse_transform method
Returns:
returned : transformed result
isotonic

Property to access sklearn.isotonic. See pandas_ml.skaccessors.isotonic

kernel_approximation

Property to access sklearn.kernel_approximation

kernel_ridge

Property to access sklearn.kernel_ridge

lda

Property to access sklearn.lda

linear_model

Property to access sklearn.linear_model. See pandas_ml.skaccessors.linear_model

lm

Property to access sklearn.linear_model. See pandas_ml.skaccessors.linear_model

manifold

Property to access sklearn.manifold. See pandas_ml.skaccessors.manifold

metrics

Property to access sklearn.metrics. See pandas_ml.skaccessors.metrics

mixture

Property to access sklearn.mixture

model_selection

Property to access sklearn.model_selection. See pandas_ml.skaccessors.model_selection

ms

Property to access sklearn.model_selection. See pandas_ml.skaccessors.model_selection

multiclass

Property to access sklearn.multiclass. See pandas_ml.skaccessors.multiclass

multioutput

Property to access sklearn.multioutput. See pandas_ml.skaccessors.multioutput

naive_bayes

Property to access sklearn.naive_bayes

neighbors

Property to access sklearn.neighbors. See pandas_ml.skaccessors.neighbors

neural_network

Property to access sklearn.neural_network

pipeline

Property to access sklearn.pipeline. See pandas_ml.skaccessors.pipeline

pp

Property to access sklearn.preprocessing. See pandas_ml.skaccessors.preprocessing

predict_log_proba(estimator, *args, **kwargs)

Call estimator’s predict_log_proba method.

Parameters:
args : arguments passed to predict_log_proba method
kwargs : keyword arguments passed to predict_log_proba method
Returns:
returned : probabilities
predict_proba(estimator, *args, **kwargs)

Call estimator’s predict_proba method.

Parameters:
args : arguments passed to predict_proba method
kwargs : keyword arguments passed to predict_proba method
Returns:
returned : probabilities
preprocessing

Property to access sklearn.preprocessing. See pandas_ml.skaccessors.preprocessing

qda

Property to access sklearn.qda

random_projection

Property to access sklearn.random_projection. See pandas_ml.skaccessors.random_projection

sample(estimator, *args, **kwargs)

Call estimator’s sample method.

Parameters:
args : arguments passed to sample method
kwargs : keyword arguments passed to sample method
Returns:
returned : sampling result
score(estimator, *args, **kwargs)

Call estimator’s score method.

Parameters:
args : arguments passed to score method
kwargs : keyword arguments passed to score method
Returns:
returned : score
seaborn

Property to access seaborn API

semi_supervised

Property to access sklearn.semi_supervised. See pandas_ml.skaccessors.semi_supervised

sns

Property to access seaborn API

svm

Property to access sklearn.svm. See pandas_ml.skaccessors.svm

target

Return target (response variable)

Returns:
target : ModelSeries
target_name

Return target column name

Returns:
target : object
transform(estimator, *args, **kwargs)

Call estimator’s transform method.

Parameters:
args : arguments passed to transform method
kwargs : keyword arguments passed to transform method
Returns:
returned : transformed result
tree

Property to access sklearn.tree

xgb

Property to access xgboost.sklearn API

xgboost

Property to access xgboost.sklearn API

class pandas_ml.core.generic.ModelPredictor

Bases: pandas_ml.core.generic.ModelTransformer

Base class for ModelFrame and ModelFrameGroupBy

decision

Return current estimator’s decision function

Returns:
decisions : ModelFrame
estimator

Return most recently used estimator

Returns:
estimator : estimator
log_proba

Return current estimator’s log probabilities

Returns:
probabilities : ModelFrame
predict(estimator, *args, **kwargs)

Call estimator’s predict method.

Parameters:
args : arguments passed to predict method
kwargs : keyword arguments passed to predict method
Returns:
returned : predicted result
predicted

Return current estimator’s predicted results

Returns:
predicted : ModelSeries
proba

Return current estimator’s probabilities

Returns:
probabilities : ModelFrame
class pandas_ml.core.generic.ModelTransformer

Bases: object

Base class for ModelFrame and ModelFrame

fit(estimator, *args, **kwargs)

Call estimator’s fit method.

Parameters:
args : arguments passed to fit method
kwargs : keyword arguments passed to fit method
Returns:
returned : None or fitted estimator
fit_transform(estimator, *args, **kwargs)

Call estimator’s fit_transform method.

Parameters:
args : arguments passed to fit_transform method
kwargs : keyword arguments passed to fit_transform method
Returns:
returned : transformed result
inverse_transform(estimator, *args, **kwargs)

Call estimator’s inverse_transform method.

Parameters:
args : arguments passed to inverse_transform method
kwargs : keyword arguments passed to inverse_transform method
Returns:
returned : transformed result
transform(estimator, *args, **kwargs)

Call estimator’s transform method.

Parameters:
args : arguments passed to transform method
kwargs : keyword arguments passed to transform method
Returns:
returned : transformed result
class pandas_ml.core.groupby.GroupedEstimator(estimator, grouped)

Bases: pandas_ml.core.base._BaseEstimator

Create grouped estimators based on passed estimator

class pandas_ml.core.groupby.ModelFrameGroupBy(obj, keys=None, axis=0, level=None, grouper=None, exclusions=None, selection=None, as_index=True, sort=True, group_keys=True, squeeze=False, observed=False, **kwargs)

Bases: pandas.core.groupby.generic.DataFrameGroupBy, pandas_ml.core.generic.ModelPredictor

transform(func, *args, **kwargs)

Call estimator’s transform method.

Parameters:
args : arguments passed to transform method
kwargs : keyword arguments passed to transform method
Returns:
returned : transformed result
class pandas_ml.core.groupby.ModelSeriesGroupBy(obj, keys=None, axis=0, level=None, grouper=None, exclusions=None, selection=None, as_index=True, sort=True, group_keys=True, squeeze=False, observed=False, **kwargs)

Bases: pandas.core.groupby.generic.SeriesGroupBy

pandas_ml.core.groupby.groupby(obj, by, **kwds)

Class for grouping and aggregating relational data.

See aggregate, transform, and apply functions on this object.

It’s easiest to use obj.groupby(…) to use GroupBy, but you can also do:

grouped = groupby(obj, ...)
Parameters:
obj : pandas object
axis : int, default 0
level : int, default None

Level of MultiIndex

groupings : list of Grouping objects

Most users should ignore this

exclusions : array-like, optional

List of columns to exclude

name : string

Most users should ignore this

Returns:
**Attributes**
groups : dict

{group name -> group labels}

len(grouped) : int

Number of groups

Notes

After grouping, see aggregate, apply, and transform functions. Here are some other brief notes about usage. When grouping by multiple groups, the result index will be a MultiIndex (hierarchical) by default.

Iteration produces (key, group) tuples, i.e. chunking the data by group. So you can write code like:

grouped = obj.groupby(keys, axis=axis)
for key, group in grouped:
    # do something with the data

Function calls on GroupBy, if not specially implemented, “dispatch” to the grouped data. So if you group a DataFrame and wish to invoke the std() method on each group, you can simply do:

df.groupby(mapper).std()

rather than

df.groupby(mapper).aggregate(np.std)

You can pass arguments to these “wrapped” functions, too.

See the online documentation for full exposition on these topics and much more

class pandas_ml.core.series.ModelSeries(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

Bases: pandas_ml.core.generic.ModelTransformer, pandas.core.series.Series

Wrapper for pandas.Series to support sklearn.preprocessing

groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False)

Group DataFrame or Series using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters:
by : mapping, function, label, or list of labels

Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If an ndarray is passed, the values are used as-is determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted a (single) key.

axis : {0 or ‘index’, 1 or ‘columns’}, default 0

Split along rows (0) or columns (1).

level : int, level name, or sequence of such, default None

If the axis is a MultiIndex (hierarchical), group by a particular level or levels.

as_index : bool, default True

For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.

sort : bool, default True

Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.

group_keys : bool, default True

When calling apply, add group keys to index to identify pieces.

squeeze : bool, default False

Reduce the dimensionality of the return type if possible, otherwise return a consistent type.

observed : bool, default False

This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

New in version 0.23.0.

**kwargs

Optional, only accepts keyword argument ‘mutated’ and is passed to groupby.

Returns:
DataFrameGroupBy or SeriesGroupBy

Depends on the calling object and returns groupby object that contains information about the groups.

See also

resample
Convenience method for frequency conversion and resampling of time series.

Notes

See the user guide for more.

Examples

>>> df = pd.DataFrame({'Animal' : ['Falcon', 'Falcon',
...                                'Parrot', 'Parrot'],
...                    'Max Speed' : [380., 370., 24., 26.]})
>>> df
   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0
>>> df.groupby(['Animal']).mean()
        Max Speed
Animal
Falcon      375.0
Parrot       25.0

Hierarchical Indexes

We can groupby different levels of a hierarchical index using the level parameter:

>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
...           ['Capitve', 'Wild', 'Capitve', 'Wild']]
>>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
>>> df = pd.DataFrame({'Max Speed' : [390., 350., 30., 20.]},
...                    index=index)
>>> df
                Max Speed
Animal Type
Falcon Capitve      390.0
       Wild         350.0
Parrot Capitve       30.0
       Wild          20.0
>>> df.groupby(level=0).mean()
        Max Speed
Animal
Falcon      370.0
Parrot       25.0
>>> df.groupby(level=1).mean()
         Max Speed
Type
Capitve      210.0
Wild         185.0
pp

Property to access sklearn.preprocessing. See pandas_ml.skaccessors.preprocessing

preprocessing

Property to access sklearn.preprocessing. See pandas_ml.skaccessors.preprocessing

to_frame(name=None)

Convert Series to DataFrame.

Parameters:
name : object, default None

The passed name should substitute for the series name (if it has one).

Returns:
data_frame : DataFrame
transform(estimator, *args, **kwargs)

Call estimator’s transform method.

Parameters:
args : arguments passed to transform method
kwargs : keyword arguments passed to transform method
Returns:
returned : transformed result

Module contents