The definition of the Sampler class in PSY-CRIS.

class posydon.active_learning.psy_cris.sample.Sampler(classifier=None, regressor=None)[source]

Bases: object

Class implementing PTMCMC and MCMC for PSY-CRIS algorith.

Modular implementation of PTMCMC and MCMC designed to implement the PSY-CRIS algorithm of sampling points in a target distribution constructed with a Classifier and Regressor. After a posterior is generated, methods in this class are also used to downsample.

Initialize the sampler.

Parameters
  • classifier (instance of <class, Classifier>) – A trained classifier object.

  • regressor (instance of <class, Regressor>, optional) – A trained regressor object.

TD_2d_analytic(name, args, **kwargs)[source]

2-dimensional analytic target distribution for testing MCMC/PTMCMC.

The function: $frac{16}{3pi} left( expleft[-mu^2 - (9 + 4mu^2 + 8nu)^2right] + frac{1}{2} expleft[- 8 mu^2 - 8 (nu-2)^2right] right)$

Parameters
  • name (str) – Name of algorithm to use. For this method None.

  • args (array) – 2D location to get the value of the function.

  • **kwargs – Kwargs for more complex target distributions.

Returns

Return type

array or float

TD_classification(classifier_name, position, **kwargs)[source]

Target distribution using classification.

$f(x) = 1 - max[P_{rm class}(x)]$

Parameters
  • classifier_name (str) – String to specify the trained classification algorithm to use.

  • position (array) – Single location in parameter space for the target distribution to be evaluated at.

  • **kwargs

    TD_BETAfloat

    Exponent of target distribution - $f(x)^{rm TD_BETA}$ Used for smoothing or sharpening.

    TD_verbosebool

    Extra print output every method call.

Returns

If classification probability is Nan: f(x) = 1E-16

Return type

array

TD_classification_regression(names, args, **kwargs)[source]

Target distribution using both classification & regression.

Classification: $1 - max[P_{rm class}(x)]$ Regression: $ A_0 log( A_1* abs( max[APC_n [loc]]) + 1 )$

Parameters
  • names (list like) – Iterable containing the two strings specifying the classification and regression algorithm to use.

  • args (array) – Position in parameter space to evaluate the target distribution at.

  • **kwargs

    TD_A1float, optional

    Scaling factor inside the Log regression error term. (Default = 0.5)

    TD_TAUfloat, optional

    Relative weight of classification to regression term. (Default = 0.5)

    TD_BETAfloat, optional

    Exponent of the entire target distribution. Used for smoothing or sharpening the distribution. Default is 1.

    TD_verbosebool, optional

    Print more diagnostic information.

  • Rreturns

  • --------

  • array

do_density_logic(step_history, N_points, Kappa, shuffle=False, norm_steps=False, var_mult=None, add_mvns_together=False, pre_acc_points=None, verbose=False)[source]

Do the density based of the normal gaussian kernel on each point.

This method automatically takes out the first 5% of steps of the MCMC so that the initial starting points are not chosen automatically (if you start in a non-ideal region). Wait for the burn in.

Parameters
  • step_history (ndarray) –

  • N_points (int) –

  • Kappa (float) –

  • shuffle (bool, optional) –

  • norm_steps (bool, optional) –

  • var_mult (float, optional) –

  • add_mvns_together (bool, optional) –

  • verbose (bool, optional) –

Returns

  • accepted_points (ndarray)

  • rejected_points (ndarray)

  • accepted_sigmas (ndarray)

do_simple_density_logic(step_history, N_points, Kappa, var_mult=None, add_mvns_together=False, include_training_data=True, verbose=False)[source]

Perform multivariate normal density logic on a given step history.

This is a simplified version of the method ‘do_density_logic’. It assumes that every accepted point will have the same exact MVN.

Each proposal distribution starts with the training set from TableData which keeps training data from being proposed again.

Parameters
  • step_history (ndarray) – List of points from a PTMCMC or MCMC. (posterior)

  • N_points (int) – Number of points desired to be drawn from the posterior but may not actually be the number of points accepted. Contributes to the length scale of the MVN distribution of accepted points (along with kappa).

  • Kappa (float) – Scaling factor that sets the initial size of the MVN for accepted points. This should be proportional to the filling factor of the area of iterest described by the target distribution used to create the posterior.

  • var_mult (float, ndarray, optional) – Variance multiplier for the MVN of accepted points.

  • add_mvns_together (bool, optional) – Add MVNs together when creating the accepted point distribution.

  • include_training_data (bool, optional) – Include the trainind data in the target distribution before sampling.

  • verbose (bool, optional) – Print useful diagnostic information.

Returns

  • accepted_points (ndarray) – Accepted points from the posterior to be labled by the user. (query points)

  • rejected_points (ndarray) – Rejected points from the posterior.

Notes

The accepted laguage here is indicative of query points for the oracle to label in an active learning scheme. It is not accepted vs rejected normally used for MCMC.

get_TD_classification_data(*args, **kwargs)[source]

Get target-distribution classification data.

Calculate terms relevant for creating target distributions with classification terms.

Parameters
  • classifier_name (str) – Trained classifier name to use for predictions.

  • position (array) – Position in parameter space to eval

  • **kwargs

    TD_verbosebool

    Print useful output

Returns

  • max_probs (array) – Maximum probabilities at each query point

  • position (array) – Position in parameter space being queried

  • cls_key (array) – Classification key predicted for each query position

get_proposed_points(step_history, N_points, Kappa, shuffle=False, norm_steps=False, add_mvns_together=False, include_training_data=True, var_mult=None, seed=None, n_repeats=1, max_iters=1000.0, verbose=False, **kwargs)[source]

Get proposed points in parameter space given a MCMC step history.

The desnity logic is not deterministic, so multiple iterations may be needed to converge on a desired number of proposed points. This method performs multiple calls to do_density_logic while changing Kappa in order to return the desired number of points. After n_iters instances of the correct number of N_points, the distibution with the largest average distance is chosen.

Warning: This algorithm has not been tested for large N data sets and

may struggle to converge.

Parameters
  • step_history (ndarray) – Posterior from which to sample new query points.

  • N_points (int) – N query points to converge to.

  • Kappa (float) – Multiplies the length scale of MVNs and changes such that the desired number of query points is found.

  • shuffle (bool, optional) – Shuffle points in posterior in place before sampling.

  • norm_steps (bool, optional) – Normalize steps before sampling.

  • add_mvns_together (bool, optional) – Add MVNs of accepted point distribution together.

  • include_training_data (bool, optional) – Include training data in the accpeted point distribution before sampling.

  • var_mult (ndarray, optional) – Variance multiplier.

  • seed (float, optional) – Random seed to use for random sampling.

  • n_repeats (int, optional) – Number of times to converge to the correct number of points. Each iteration may be a different realization of the posterior.

  • verbose (bool, optional) – Print useful information.

  • **kwargs – show_plots : bool, optional Show 2D plot of proposed points with step history & training data.

Returns

  • acc_pts (ndarray) – Array of proposed points to be used as initial conditions in new simulations.

  • Kappa (float) – Scaling factor which reproduced the desired number of accepted points.

Notes

Will automatically exit if it goes through max_iters iterations without converging on the desired number of points.

get_saved_chain_step_history(key, return_all=False)[source]

Return the saved chain step history.

make_prop_points_plots(step_hist, prop_points, axes=(0, 1), show_fig=True, save_fig=False)[source]

Plot the proposed / accepted points over the step history.

make_trace_plot(chain_holder, T_list, Temp, save_fig=False, show_fig=True)[source]

Make a step number vs. position of a sampler in an axis plot.

This function makes titles assuming you are using the data from the classifier.

normalize_step_history(step_history)[source]

Take steps and normalize [0,1] according to min/max in each axis.

The max and min are taken from the original data set from TableData.

run_MCMC(N_trials, alpha, step_history, target_dist, classifier_name, T=1, upper_limit_reject=10000.0, **TD_kwargs)[source]

Run a Markov chain Monte Carlo given a target distribution.

Parameters
  • N_trials (int) – Number of proposals or trial steps to take before stopping.

  • alpha (float) – Related to the step size of the MCMC walker. Defines the standard deviation of a zero mean normal from which the step is randomly drawn.

  • step_history (list) – Initial starting location in parameter space. Could contain an arbitrary number of previous steps but a walker will start at the last step in the list.

  • targe_dist (callable) – The target distribution to sample. Must take arguments ( method_name, element_of_step_history ) (A 2D analytic function is provided - TD_2d_analytic)

  • classifier_name (str) – Name of interpolation technique used in the target_dist.

  • T (float, optional) – Temperature of the MCMC.

  • upper_limit_reject (int, optional) – Sets the maximum number of rejected steps before the MCMC stops walking. Avoiding a slowly converging walk with few accepted points.

Returns

  • step_history (array) – An array containing all accepted steps of the MCMC.

  • accept (int) – Total number of accepted steps.

  • reject (int) – Total number of rejected steps.

Notes

Assumes uniform priors and a symetric jump proposal (gaussian).

run_PTMCMC(T_max, N_tot, target_dist, classifier_name, init_pos=None, N_draws_per_swap=3, c_spacing=1.2, alpha=None, upper_limit_reject=100000.0, verbose=False, trace_plots=False, **TD_kwargs)[source]

Run a Paralel Tempered MCMC with user-specified target distribution.

Calls the method run_MCMC.

Parameters
  • T_max (float) – Sets the maximum temperature MCMC in the chain.

  • N_tot (int) – The total number of iterations for the PTMCMC.

  • target_dist (callable) – The target distribution to sample. Must take arguments (method_name, location_to_eval) (A 2D analytic function is provided - analytic_target_dist)

  • classifier_name (str, list) – A single string or list of strings specifying the interpolator to use for classification or classification & regression respectively.

  • init_pos (array) – Initial position of walkers in each axis. Default is the median of the input data in TableData.

  • N_draws_per_swap (int, optional) – Number of draws to perform for each MCMC before swap proposals.

  • c_spacing (float, optional) – Sets the spacing of temperatures in each chain. T_{i+1} = T_{i}^{1/c}, range: [T_max , T=1]

  • alpha (float, optional) – Sets the standard deviation of steps taken by the walkers. Default is 1/5 the range of training data from TableData.

  • upper_limit_reject (float, optional) – Sets the upper limit of rejected points.

  • verbose (bool, optional) – Useful print statements during execution.

Returns

  • chain_step_history (dict) – Hold the step history for every chain. Keys are integers that range from 0 (max T) to the total number of chains -1 (min T).

  • T_list (array) – Array filled with the temperatures of each chain from max to min.

Notes

There is a zero prior on the PTMCMC outside the range of training data.

save_chain_step_history(key, chain_step_history, overwrite=False)[source]

Save PTMCMC output chain_step_history inside the sampler object.

undo_normalize_step_history(normed_steps)[source]

Rescale step history.

Take normed steps from [0,1] and return their value in the original range of the axes based off the range in TableData.