The definition of the Sampler class in PSY-CRIS.
- class posydon.active_learning.psy_cris.sample.Sampler(classifier=None, regressor=None)[source]
Bases:
object
Class implementing PTMCMC and MCMC for PSY-CRIS algorith.
Modular implementation of PTMCMC and MCMC designed to implement the PSY-CRIS algorithm of sampling points in a target distribution constructed with a Classifier and Regressor. After a posterior is generated, methods in this class are also used to downsample.
Initialize the sampler.
- Parameters
classifier (instance of <class, Classifier>) – A trained classifier object.
regressor (instance of <class, Regressor>, optional) – A trained regressor object.
- TD_2d_analytic(name, args, **kwargs)[source]
2-dimensional analytic target distribution for testing MCMC/PTMCMC.
The function: $frac{16}{3pi} left( expleft[-mu^2 - (9 + 4mu^2 + 8nu)^2right] + frac{1}{2} expleft[- 8 mu^2 - 8 (nu-2)^2right] right)$
- TD_classification(classifier_name, position, **kwargs)[source]
Target distribution using classification.
$f(x) = 1 - max[P_{rm class}(x)]$
- Parameters
classifier_name (str) – String to specify the trained classification algorithm to use.
position (array) – Single location in parameter space for the target distribution to be evaluated at.
**kwargs –
- TD_BETAfloat
Exponent of target distribution - $f(x)^{rm TD_BETA}$ Used for smoothing or sharpening.
- TD_verbosebool
Extra print output every method call.
- Returns
If classification probability is Nan: f(x) = 1E-16
- Return type
array
- TD_classification_regression(names, args, **kwargs)[source]
Target distribution using both classification & regression.
Classification: $1 - max[P_{rm class}(x)]$ Regression: $ A_0 log( A_1* abs( max[APC_n [loc]]) + 1 )$
- Parameters
names (list like) – Iterable containing the two strings specifying the classification and regression algorithm to use.
args (array) – Position in parameter space to evaluate the target distribution at.
**kwargs –
- TD_A1float, optional
Scaling factor inside the Log regression error term. (Default = 0.5)
- TD_TAUfloat, optional
Relative weight of classification to regression term. (Default = 0.5)
- TD_BETAfloat, optional
Exponent of the entire target distribution. Used for smoothing or sharpening the distribution. Default is 1.
- TD_verbosebool, optional
Print more diagnostic information.
Rreturns –
-------- –
array –
- do_density_logic(step_history, N_points, Kappa, shuffle=False, norm_steps=False, var_mult=None, add_mvns_together=False, pre_acc_points=None, verbose=False)[source]
Do the density based of the normal gaussian kernel on each point.
This method automatically takes out the first 5% of steps of the MCMC so that the initial starting points are not chosen automatically (if you start in a non-ideal region). Wait for the burn in.
- Parameters
- Returns
accepted_points (ndarray)
rejected_points (ndarray)
accepted_sigmas (ndarray)
- do_simple_density_logic(step_history, N_points, Kappa, var_mult=None, add_mvns_together=False, include_training_data=True, verbose=False)[source]
Perform multivariate normal density logic on a given step history.
This is a simplified version of the method ‘do_density_logic’. It assumes that every accepted point will have the same exact MVN.
Each proposal distribution starts with the training set from TableData which keeps training data from being proposed again.
- Parameters
step_history (ndarray) – List of points from a PTMCMC or MCMC. (posterior)
N_points (int) – Number of points desired to be drawn from the posterior but may not actually be the number of points accepted. Contributes to the length scale of the MVN distribution of accepted points (along with kappa).
Kappa (float) – Scaling factor that sets the initial size of the MVN for accepted points. This should be proportional to the filling factor of the area of iterest described by the target distribution used to create the posterior.
var_mult (float, ndarray, optional) – Variance multiplier for the MVN of accepted points.
add_mvns_together (bool, optional) – Add MVNs together when creating the accepted point distribution.
include_training_data (bool, optional) – Include the trainind data in the target distribution before sampling.
verbose (bool, optional) – Print useful diagnostic information.
- Returns
accepted_points (ndarray) – Accepted points from the posterior to be labled by the user. (query points)
rejected_points (ndarray) – Rejected points from the posterior.
Notes
The accepted laguage here is indicative of query points for the oracle to label in an active learning scheme. It is not accepted vs rejected normally used for MCMC.
- get_TD_classification_data(*args, **kwargs)[source]
Get target-distribution classification data.
Calculate terms relevant for creating target distributions with classification terms.
- Parameters
classifier_name (str) – Trained classifier name to use for predictions.
position (array) – Position in parameter space to eval
**kwargs –
- TD_verbosebool
Print useful output
- Returns
max_probs (array) – Maximum probabilities at each query point
position (array) – Position in parameter space being queried
cls_key (array) – Classification key predicted for each query position
- get_proposed_points(step_history, N_points, Kappa, shuffle=False, norm_steps=False, add_mvns_together=False, include_training_data=True, var_mult=None, seed=None, n_repeats=1, max_iters=1000.0, verbose=False, **kwargs)[source]
Get proposed points in parameter space given a MCMC step history.
The desnity logic is not deterministic, so multiple iterations may be needed to converge on a desired number of proposed points. This method performs multiple calls to do_density_logic while changing Kappa in order to return the desired number of points. After n_iters instances of the correct number of N_points, the distibution with the largest average distance is chosen.
- Warning: This algorithm has not been tested for large N data sets and
may struggle to converge.
- Parameters
step_history (ndarray) – Posterior from which to sample new query points.
N_points (int) – N query points to converge to.
Kappa (float) – Multiplies the length scale of MVNs and changes such that the desired number of query points is found.
shuffle (bool, optional) – Shuffle points in posterior in place before sampling.
norm_steps (bool, optional) – Normalize steps before sampling.
add_mvns_together (bool, optional) – Add MVNs of accepted point distribution together.
include_training_data (bool, optional) – Include training data in the accpeted point distribution before sampling.
var_mult (ndarray, optional) – Variance multiplier.
seed (float, optional) – Random seed to use for random sampling.
n_repeats (int, optional) – Number of times to converge to the correct number of points. Each iteration may be a different realization of the posterior.
verbose (bool, optional) – Print useful information.
**kwargs – show_plots : bool, optional Show 2D plot of proposed points with step history & training data.
- Returns
acc_pts (ndarray) – Array of proposed points to be used as initial conditions in new simulations.
Kappa (float) – Scaling factor which reproduced the desired number of accepted points.
Notes
Will automatically exit if it goes through max_iters iterations without converging on the desired number of points.
- make_prop_points_plots(step_hist, prop_points, axes=(0, 1), show_fig=True, save_fig=False)[source]
Plot the proposed / accepted points over the step history.
- make_trace_plot(chain_holder, T_list, Temp, save_fig=False, show_fig=True)[source]
Make a step number vs. position of a sampler in an axis plot.
This function makes titles assuming you are using the data from the classifier.
- normalize_step_history(step_history)[source]
Take steps and normalize [0,1] according to min/max in each axis.
The max and min are taken from the original data set from TableData.
- run_MCMC(N_trials, alpha, step_history, target_dist, classifier_name, T=1, upper_limit_reject=10000.0, **TD_kwargs)[source]
Run a Markov chain Monte Carlo given a target distribution.
- Parameters
N_trials (int) – Number of proposals or trial steps to take before stopping.
alpha (float) – Related to the step size of the MCMC walker. Defines the standard deviation of a zero mean normal from which the step is randomly drawn.
step_history (list) – Initial starting location in parameter space. Could contain an arbitrary number of previous steps but a walker will start at the last step in the list.
targe_dist (callable) – The target distribution to sample. Must take arguments ( method_name, element_of_step_history ) (A 2D analytic function is provided - TD_2d_analytic)
classifier_name (str) – Name of interpolation technique used in the target_dist.
T (float, optional) – Temperature of the MCMC.
upper_limit_reject (int, optional) – Sets the maximum number of rejected steps before the MCMC stops walking. Avoiding a slowly converging walk with few accepted points.
- Returns
step_history (array) – An array containing all accepted steps of the MCMC.
accept (int) – Total number of accepted steps.
reject (int) – Total number of rejected steps.
Notes
Assumes uniform priors and a symetric jump proposal (gaussian).
- run_PTMCMC(T_max, N_tot, target_dist, classifier_name, init_pos=None, N_draws_per_swap=3, c_spacing=1.2, alpha=None, upper_limit_reject=100000.0, verbose=False, trace_plots=False, **TD_kwargs)[source]
Run a Paralel Tempered MCMC with user-specified target distribution.
Calls the method run_MCMC.
- Parameters
T_max (float) – Sets the maximum temperature MCMC in the chain.
N_tot (int) – The total number of iterations for the PTMCMC.
target_dist (callable) – The target distribution to sample. Must take arguments (method_name, location_to_eval) (A 2D analytic function is provided - analytic_target_dist)
classifier_name (str, list) – A single string or list of strings specifying the interpolator to use for classification or classification & regression respectively.
init_pos (array) – Initial position of walkers in each axis. Default is the median of the input data in TableData.
N_draws_per_swap (int, optional) – Number of draws to perform for each MCMC before swap proposals.
c_spacing (float, optional) – Sets the spacing of temperatures in each chain. T_{i+1} = T_{i}^{1/c}, range: [T_max , T=1]
alpha (float, optional) – Sets the standard deviation of steps taken by the walkers. Default is 1/5 the range of training data from TableData.
upper_limit_reject (float, optional) – Sets the upper limit of rejected points.
verbose (bool, optional) – Useful print statements during execution.
- Returns
chain_step_history (dict) – Hold the step history for every chain. Keys are integers that range from 0 (max T) to the total number of chains -1 (min T).
T_list (array) – Array filled with the temperatures of each chain from max to min.
Notes
There is a zero prior on the PTMCMC outside the range of training data.