The PSY-CRIS classification module.

class posydon.active_learning.psy_cris.classify.Classifier(TableData_object)[source]

Bases: object

Classifier class.

Perform one against all classification with a variety of different classification algorithms (interpolators). Different classifcation algorithms are trainined and stored for recall as instance variables inside nested dictionaries. This class also supports model validation through cross validation using the holdout method.

Initialize the classifier.

Parameters

TableData_object (instance of <class, TableData>) – An instance of the TableData class with training data.

cross_validate(classifier_names, alpha, verbose=False)[source]

Cross validate classifiers on data from TableData object.

For each iteration, the classifiers specified are all trained and tested on the same random subset of data.

Parameters
  • classifier_names (array) – Names of classifiers to train.

  • alpha (float) – Fraction of data set to use for training. (0.05 = 5% of data set)

  • verbose (bool, optional) – Print statements with more information while training.

Returns

  • percent_correct (ndarray) – Percent correct classification on (1-alpha)% of the data set. Element order matches the order of classifier_names.

  • time_to_train (ndarray) – Time to train classifiers on a data set. Element order matches the order of classifier_names.

fit_gaussian_process_classifier(data_interval=None, my_kernel=None, n_restarts=5, verbose=False)[source]

Fit a Gaussian Process classifier.

Implementation from: sklearn.gaussian_process (https://scikit-learn.org/stable/modules/gaussian_process.html)

Parameters
  • data_interval (array_int, optional) – Array indicies of data used to train (training on a subset). if None (default) train on whole data set

  • my_kernel (kernel) – Set the kernel for the GPC.

  • n_restarts (int) – Number of restarts for the GPC.

  • verbose (bool, optional) – Print statements with more information while training.

Returns

binary_classifier_holder – Sorted by class, each key maps to a trained GaussianProcessClassifier object.

Return type

array_like

fit_linear_ND_interpolator(data_interval=None, verbose=False)[source]

Fit linear ND interpolator - binary one-against-all classification.

Implementation from: scipy.interpolate.LinearNDInterpolator (https://docs.scipy.org/doc/scipy/reference/interpolate.html)

Parameters
  • data_interval (array_int, optional) – Array indicies of data used to train (training on a subset). if None (default) train on whole data set

  • verbose (bool, optional) – Print statements with more information while training.

Returns

binary_classifier_holder – Sorted by class, each key maps to a trained linearNDinterpolator object.

Return type

dict

fit_rbf_interpolator(data_interval=None, verbose=False)[source]

Fit RBF interpolator - binary classification (one against all).

Implementation from: scipy.interpolate.Rbf (https://docs.scipy.org/doc/scipy/reference/interpolate.html)

Parameters
  • data_interval (array_int, optional) – Array indicies of data used to train (training on a subset). if None (default) train on whole data set

  • verbose (bool, optional) – Print statements with more information while training.

Returns

binary_classifier_holder – Sorted by class, each key maps to a trained RBF object.

Return type

dict

get_class_predictions(classifier_name, test_input, return_ids=True)[source]

Return the class predictions.

The predictions are in the form of class IDs or the original classification key. This method also returns the probability of the class that was predicted.

Parameters
  • classifier_name (str) – Name of classification algorithm to use.

  • test_input (ndarray) – Input values to predict. Same shape as input data.

  • return_ids (bool, optional) – If True (default), return class IDs. Else, return the original classification keys.

Returns

  • pred_class_ids (array) – Predicted class IDs given test input.

  • max_probs (array) – Probability the classifier gives for the chosen class.

  • where_not_nan (array) – Inidices where there are no nans (from LinearNDInterpolator). You may use this to pick out which input data gives a valid classification.

get_classifier_name_to_key(classifier_name)[source]

Return the standard key (str) of a classifier.

Parameters

classifier_name (str) – Name of classification algorithm to use.

Returns

key – Key to access trained classifier objects.

Return type

str

get_cross_val_data(alpha)[source]

Randomly sample the data set and seperate training and test data.

Parameters

alpha (float) – Fraction of data set to use for training. (0.05 = 5% of data set)

Returns

  • sorted_rnd_int_vals (array) – Array indicies for data used to train interpolators.

  • cv_test_input_data (array) – Input test data to perform cross validation.

  • cv_test_output_data (array) – Output test data to perform cross validation.

get_rnd_test_inputs(N, other_rng={}, verbose=False)[source]

Produce randomly sampled ‘test’ inputs inside domain of input_data.

Parameters
  • N (int) – Number of test inputs to return.

  • other_rng (dict, optional) – Change the range of random sampling in desired axis. By default, the sampling is done in the range of the training data. The axis is specified with an integer key in [0,N-1] mapping to a list specifying the range. (e.g. {1:[my_min, my_max]})

  • verbose (bool, optional) – Print diagnostic information.

Returns

rnd_test_points – Test points randomly sampled in the range of the training data in each axis unless otherwise specified in ‘other_rng’. Has the same shape as input data from TableData.

Return type

ndarray

make_cv_plot_data(interp_type, alphas, N_iterations, folder_path='cv_data/')[source]

Script for running many instances of the method cross_validate().

Cross validation score and timing data produced are saved locally.

! Time to train GaussianProcessClassifier becomes large for num training points > 1000. !

Files saved every 5 iterations to prevent loss of data for large N_iterations. Known expection occurs in GP classifier for low alpha due to data set with only one class.

Parameters
  • interp_type (array_str) – Names of classifiers to train.

  • alphas (array_floats) – Fractions of data set to use for training. (0.05 = 5% of data set) (ex. [0.01, 0.02, …])

  • N_iterations (int) – Number of iterations to run cross validation at a given alpha.

  • folder_path (str) – Folder path where to save cross validation and timing data (“your_folder_path/”).

Returns

Return type

None

make_max_cls_plot(classifier_name, axes_keys, other_rng={}, N=4000, **kwargs)[source]

Make the maximum classification probablity plot.

Not generalized yet to slice along redundant axes.

return_probs(classifier_name, test_input, verbose=False)[source]

Return probability that a given input corresponds to a class.

The probability is calculated using trained classifiers.

Parameters
  • classifier_name (str) – Name of classifier to train.

  • test_input (ndarray) – N dimensional inputs to be classified. The shape should be N_points x N_dimensions.

  • verbose (optional, bool) – Print useful information.

Returns

  • normalized_probs (ndarray) – Array holding the normalized probability for a point to be in any of the possible classes. Shape is N_points x N_classes.

  • where_not_nan (ndarray) – Indicies of the test inputs that did not result in nans.

train(classifier_name, di=None, verbose=False, **kwargs)[source]

Train a classifier.

Implemented classifiers:

LinearNDInterpolator (‘linear’, …) Radial Basis Function (‘rbf’, …) GaussianProcessClassifier (‘gp’, …)

>>> cl = Classifier( TableData_object )
>>> cl.train('linear', di = np.arange(0, Ndatapoints, 5), verbose=True)
Parameters
  • classifier_name (str) – Name of classifier to train.

  • di (array_int, optional) – Array indicies of data used to train (training on a subset). if None - train on whole data set

  • train_cross_val (bool, optional) – For storing regular trained interpolators and cross val interpolators. Used in the cross_validate() method. if False - save normal interpolators if True - save cross validation interpolators

  • verbose (bool, optional) – Print statements with more information while training.

Returns

Return type

None

train_everything(classifier_names, verbose=False)[source]

Trains multiple classifiers at once.

Parameters

classifier_names (list) – List of strings specifying classification algorithms to use.

Returns

Return type

None

posydon.active_learning.psy_cris.classify.makehash()[source]

Manage nested dictionaries.