.. _PSyGrid:

###############################
The PSyGrid Object
###############################

The PSyGrid class is the "bridge" connecting MESA and POSYDON. It:

- encapsulates the data from a MESA grid in a compact form
- saves the data in an HDF5 file
- allows the user to access the data in a memory-efficient way

To use PSyGrid import it using:

.. code-block:: python

  from posydon.grids.psygrid import PSyGrid


Creating a PSyGrid object
=========================

A PSyGrid HDF5 file can be created by using the ``create()`` method on a newly
constructed PSyGrid object.

Basic example
-------------
The simplest method is to provide the paths of the folder of the MESA runs
output, and the HDF5 file to be created.

.. code-block:: python

  grid = PSyGrid()
  grid.create(MESA_runs_directory, psygrid_hdf5_path)

Now, the PSyGrid object is ready to be used for accessing and plotting data.

Output options
--------------
Use the option ``verbose`` to see more details while the PSyGrid object is
being built.

Set ``overwrite`` to ``True`` if the HDF5 file already exists.

By default, any warnings produced are collected and shown at the end
(``warn="end"``) of the process. Use ``warn="normal"`` to see the warning
when they occur, or ``warn="suppress"`` to hide them completely.

.. code-block:: python

  grid = PSyGrid()
  grid.create(MESA_runs_directory, psygrid_hdf5_path,
              verbose=True, overwrite=True, warn="suppress")


Setting the properties of the grid
----------------------------------
PSyGrid provides a compact form of the initial MESA grid. To define how many runs should be included (e.g. for testing purposes),
and which data columns are needed, use the following code block as a guide.

.. code-block:: python

  DEFAULT_BINARY_HISTORY_COLS = ['star_1_mass', 'star_2_mass', 'period_days', 'binary_separation', 'age', 'rl_relative_overflow_1', 'rl_relative_overflow_2', 'lg_mtransfer_rate']
  DEFAULT_STAR_HISTORY_COLS = ['star_age', 'star_mass', 'he_core_mass', 'c_core_mass', 'center_h1', 'center_he4', 'c12_c12', 'center_c12', 'log_LH', 'log_LHe', 'log_LZ', 'log_Lnuc', 'surface_h1', 'log_Teff', 'log_L']
  DEFAULT_PROFILE_COLS = ['radius', 'mass', 'logRho', 'omega']

  GRIDPROPERTIES = {
      # file loading parameters
      "description": "",                      # description text
      "max_number_of_runs": None,
      "format": "hdf5",
      # resampling parameters
      "n_downsample_history": None,           # False/None = no hist. resampling
      "n_downsample_profile": None,           # False/None = no prof. resampling
      "history_resample_columns": DEFAULT_HISTORY_DOWNSAMPLE_COLS,
      "profile_resample_columns": DEFAULT_PROFILE_DOWNSAMPLE_COLS,
      # columns to pass to the grid
      "star1_history_saved_columns": DEFAULT_STAR_HISTORY_COLS,
      "star2_history_saved_columns": DEFAULT_STAR_HISTORY_COLS,
      "binary_history_saved_columns": DEFAULT_BINARY_HISTORY_COLS,
      "star1_profile_saved_columns": DEFAULT_PROFILE_COLS,
      "star2_profile_saved_columns": DEFAULT_PROFILE_COLS,
      # Initial/Final value arrays
      "initial_value_columns": None,
      "final_value_columns": None,
  }

  grid = PSyGrid()
  grid.create(MESA_runs_directory, psygrid_path, **GRIDPROPERTIES)


Loading an existing PSyGrid object
==================================

To load a PSyGrid object from an HDF5 file you only need its path:

.. code-block:: python

  grid = PSyGrid('/home/mydata/my_MESA_grid')

Closing/deleting a PSyGrid object
==================================
As with files in Python, it is preferable to "close" PSyGrid objects after
handling them.

The ``.close()`` method closes the HDF5 file associated to the PSyGrid object.
However, the object continues to exist and hold all the metadata.

.. code-block:: python

  grid.close()

In the case that a PSyGrid object is not needed anymore,
use the ``del`` keyword. Note that this also closes the HDF5 file.

.. code-block:: python

  del grid


Grid engineering (advanced use case scenario)
=============================================

When confronted with a large parameter space to cover, we advise that the
parameter space be split into multiple directories for separate grid slices,
composed of a few thousand runs each. These separate grids can be combined
together afterwards.

Here we preset an advanced use of the ``PSyGrid`` object meant to be used on a
HPC facility with Slurm. Our experience is that grid manipulation is most
easily performed with a Jupyter notebook using Slurm magic commands which can
be installed with

.. code-block:: python

  !pip install git+https://github.com/NERSC/slurm-magic.git

directly from the Jupyter notebook.

Here we illustrate how to post process an example HMS-HMS grid composed of
three slices at fixed mass ratios of 0.50, 0.70, and 0.90.

Creating grid slices
--------------------

The following script allows to post process the raw MESA data into a ``PSyGrid``
object either in its ``ORIGINAL`` form (without any downsampling) or to use
the ``LITE`` downsampling presented in Fragos et al. (2022).

.. code-block:: python

  %%writefile create_individual_psygrid_files.py
  import os
  import sys
  import numpy as np
  from posydon.grids.psygrid import (PSyGrid, DEFAULT_HISTORY_DS_EXCLUDE,
                                     DEFAULT_PROFILE_DS_EXCLUDE,
                                     EXTRA_COLS_DS_EXCLUDE)

  if __name__ == "__main__":

      # directory with MESA data
      path = '/working_dir/'

      # MESA grid slices to post process
      grid_names = ['grid_q_0.50','grid_q_0.70','grid_q_0.90']

      # choose grid slice
      i = int(sys.argv[1])
      print('Job array index:',i)
      grid_name = grid_names[i]

      # choose the compression
      grid_type = str(sys.argv[2])
      if grid_type == 'ORIGINAL':
          history_DS_error = None
          profile_DS_error = None
          history_DS_exclude = DEFAULT_HISTORY_DS_EXCLUDE
          profile_DS_exclude = DEFAULT_PROFILE_DS_EXCLUDE
      elif grid_type == 'LITE':
          history_DS_error = 0.1
          profile_DS_error = 0.1
          history_DS_exclude = EXTRA_COLS_DS_EXCLUDE
          profile_DS_exclude = EXTRA_COLS_DS_EXCLUDE
      else:
          raise ValueError('grid_type = %s not supported!'%grid_type)

      print('Creating psygrid for',grid_name, '...')
      grid = PSyGrid(verbose=True)
      grid.create(path+"%s"%grid_name,
                  "./"+grid_type+"/%s.h5"%grid_name,
                  overwrite=True, history_DS_error=history_DS_error,
                  profile_DS_error=profile_DS_error,
                  history_DS_exclude=history_DS_exclude,
                  profile_DS_exclude=profile_DS_exclude,
                  compression="gzip9", start_at_RLO=False,
                 )

The script above creates a script with the name
``create_individual_psygrid_files.py`` which we can run with the following
Slurm magic command.
IMPORTANT: we will run the script with both options compressions options
LITE and ORIGINAL. The uncompressed version of the grid will be used in a later
step
Before running the script we need to create a ``logs`` directory where we will
store Slurm output messages, and two directories where we will store the two
different ``PSyGrid`` object outputs, i.e. ``ORIGINAL`` and ``LITE``.
Once the post processing of the data is complete, we will be notified by email
from Slurm.


.. code-block:: python

  %%sbatch
  #!/bin/bash
  #SBATCH --mail-type=ALL
  #SBATCH --mail-user=my_email
  #SBATCH --account=b1119
  #SBATCH --partition=posydon-priority
  #SBATCH --array=0-2
  #SBATCH --ntasks-per-node 1
  #SBATCH --mem-per-cpu=8G
  #SBATCH --time=24:00:00
  #SBATCH --job-name="psygrid"
  #SBATCH --output=/working_dir/grid_%a.out

  srun python /working_dir/create_individual_psygrid_files.py $SLURM_ARRAY_TASK_ID LITE

The above code cell will return the Slurm job id which we can use to verify that
our script is working correctly. In this specific example each grid slice is
composed by 2800 MESA HMS-HMS simulations. The post processing
will take roughly one hour per grid slice.

.. code-block:: python

  %squeue -j 3761056

.. image:: pngs/slurm_job.png

Rerun subsample of grid slices
------------------------------

Because of convergence issues, a non-negligible fraction of the MESA
simulations did not converge. The grid architect would now like to rerun the
subsample of the grid with convergence problem changing one or more MESA inlist
flags. This can be easily done with the ``.rerun()`` method.

.. code-block:: python

  grid = PSyGrid('/working_dir/LITE/grid_0.70.h5')
  grid.rerun('/working_dir/grid_0.70_rerun/',
             termination_flags=['min_timestep_limit','reach cluster timelimit'],
             new_mesa_flag={'opacity_max':0.5})
  grid.close()

The above script loads the LITE version of the ``PSyGrid`` object
``grid_q_0.70.h5``, generates a new ``grid.csv`` file containing the initial
points runs with given ``termination_flags`` (see TF1 of plot2D documentation)
and additional columns corresponding to the new MESA flags specified in the
dictionary ``new_mesa_flag``, e.g. limiting the opacity maximal value of a star.

Alternatively, the grid architect might care to select a subsample of the MESA
runs with some user defined logic, e.g. to generate a patch of the grid which
addresses a change that affect a portion of the parameter space to rerun with a
new MESA inlist commit (see running MESA documentation). This can be specified
with the ``runs_to_rerun`` option of the ``.rerun`` method which expects a list
of indices of tracks to rerun.


Combine grid slices
-------------------

Let assume that we now have the three original grid slices
(``grid_q_0.50.h5,grid_q_0.70.h5,grid_q_0.90.h5``),
three additional reruns of them addressing convergence issues
(``grid_q_0.50_rerun.h5,grid_q_0.70_rerun.h5,grid_q_0.90_rerun.h5``),
and a grid slice extension which cover wider larger orbital periods not covered
in the main grids (``grid_p_extension.h5``) which we want to combine in a
single file.

The function ``join_grids`` does exactly this for us taking care of replacing
older runs with newer one. The grid layering follows the list order,
namely the last grid will be layered last.
Here is how we can combine the grid slices

.. code-block:: python

  %%writefile combine_grid_slices.py
  import os
  import sys
  import numpy as np
  from posydon.grids.psygrid import PSyGrid, join_grids

  if __name__ == "__main__":

      # choose the compression
      grid_type = str(sys.argv[1])

      path = '/working_dir/'+grid_type+'/'

      # grid slices to combine
      grid_names = ['grid_q_0.50.h5','grid_q_0.70.h5','grid_q_0.90.h5',
                    'grid_q_0.50_rerun.h5','grid_q_0.70_rerun.h5',
                    'grid_q_0.90_rerun.h5','grid_p_extension.h5'
                   ]
      grid_paths = [path+name for name in grid_names]

      print('Combining the grids:')
      print(grid_paths)
      print('')
      join_grids(grid_paths,path+'grid_combined.h5')
      print('DONE!')

The above script is run for both grid compressions ``LITE`` and ``ORIGINAL``
with the following cell.

.. code-block:: python

  %%sbatch
  #!/bin/bash
  #SBATCH --mail-type=ALL
  #SBATCH --mail-user=my_email
  #SBATCH --account=b1119
  #SBATCH --partition=posydon-priority
  #SBATCH --ntasks-per-node 1
  #SBATCH --mem-per-cpu=8G
  #SBATCH --time=24:00:00
  #SBATCH --job-name="psygrid"
  #SBATCH --output=/working_dir/logs/combine_grid_slices.out

  srun python /working_dir/combine_grid_slices.py LITE


Add post process quantities
---------------------------

There is a list of quantities that ``POSYDON`` requires to be precomputed
on the original grids. These quantities include, e.g., core-collapse
and common envelope properties.

.. code-block:: python

  %%writefile post_process_grid.py
  import os
  import sys
  from shutil import copyfile
  import numpy as np
  import pickle
  from posydon.grids.psygrid import PSyGrid
  from posydon.grids.post_processing import post_process_grid, add_post_processed_quantities


  if __name__ == "__main__":

      path = '/working_dir/'

      # chose the grid given the job_array index
      grid_name = 'grid_combined.h5'
      grid_name_processed = 'grid_combined_processed.h5'
      columns_name = 'post_processed_EXTRA_COLUMNS.pkl'
      dirs_name = 'post_processed_MESA_dirs.txt'

      # copy file, it will be overwritten when we add columns
      if os.path.exists(path+'ORIGINAL/'+grid_name_processed):
          print('Post processed grid file alredy exist, removing it...')
          os.remove(path+'ORIGINAL/'+grid_name_processed)
      copyfile(path+'ORIGINAL/'+grid_name, path+'ORIGINAL/'+grid_name_processed)

      grid_ORIGINAL = PSyGrid(path+'ORIGINAL/'+grid_name_processed)
      MESA_dirs_EXTRA_COLUMNS, EXTRA_COLUMNS = post_process_grid(grid_ORIGINAL,
                                    index=None, star_2_CO=False, verbose=False)

      # save processed quantities
      if os.path.exists(path+'ORIGINAL/'+columns_name):
          print('EXTRA COLUMNS file alredy exist, removing it...')
          os.remove(path+'ORIGINAL/'+columns_name)
      with open(path+'ORIGINAL/'+columns_name, 'wb') as handle:
          pickle.dump(EXTRA_COLUMNS, handle, protocol=pickle.HIGHEST_PROTOCOL)
      if os.path.exists(path+'ORIGINAL/'+dirs_name):
          print('MESA dirs file alredy exist, removing it...')
          os.remove(path+'ORIGINAL/'+dirs_name)
      with open(path+'ORIGINAL/'+dirs_name, 'w') as filehandle:
          for listitem in MESA_dirs_EXTRA_COLUMNS:
              filehandle.write('%s\n'%listitem)

      print('Add post porcessed columns to ORIGIN grid...')
      add_post_processed_quantities(grid_ORIGINAL, MESA_dirs_EXTRA_COLUMNS,
                                    EXTRA_COLUMNS, verbose=False)
      grid_ORIGINAL.close()

      # copy file, it will be overwritten when we add columns
      if os.path.exists(path+'LITE/'+grid_name_processed):
          print('Post processed grid file alredy exist, removing it...')
          os.remove(path+'LITE/'+grid_name_processed)
      copyfile(path+'LITE/'+grid_name, path+'LITE/'+grid_name_processed)

      grid_LITE = PSyGrid(path+'LITE/'+grid_name_processed)
      print('Add post porcessed columns to LITE grid...')
      add_post_processed_quantities(grid_LITE, MESA_dirs_EXTRA_COLUMNS,
                                    EXTRA_COLUMNS, verbose=False)
      grid_LITE.close()
      print('Done!')

The script can be run with the following Slurm magic commad.

.. code-block:: python

  %%sbatch
  #!/bin/bash
  #SBATCH --mail-type=ALL
  #SBATCH --mail-user=my_email
  #SBATCH --account=b1119
  #SBATCH --partition=posydon-priority
  #SBATCH --ntasks-per-node 1
  #SBATCH --mem-per-cpu=8G
  #SBATCH --time=24:00:00
  #SBATCH --job-name="psygrid"
  #SBATCH --output=/working_dir/logs/post_process_grid.out

  export PATH_TO_POSYDON=/add_your_path/
  srun python /working_dir/post_process_grid.py


Splitting the grid into chunks
------------------------------

Some data sharing services limit the file size. E.g. ``git-lfs`` limits
the maximum file size to be around 2 GB. The following script splits the
post processed grid into chunks of files of 2 GB size.

.. code-block:: python

  %%writefile split_grid.py
  import os
  import numpy as np
  from posydon.grids.psygrid import PSyGrid, join_grids

  if __name__ == "__main__":

        path = '/working_dir/'

        git_lfs_dir = 'git-lfs_data_format'

        files = os.listdir(path+git_lfs_dir)
        if files:
            print('Remove old grid in git_lfs_dir ...')
            for file in files:
                if '.h5' in file:
                    os.remove(path+git_lfs_dir+file)

        grid_names = [path+'LITE/grid_combined_processed.h5']
        print('')
        join_grids(grid_names,path+git_lfs_dir+'/grid_%d.h5')
        print('DONE!')

The above script can be run with the following Slurm magic command.

.. code-block:: python

  %%sbatch
  #!/bin/bash
  #SBATCH --mail-type=ALL
  #SBATCH --mail-user=my_email
  #SBATCH --account=b1119
  #SBATCH --partition=posydon-priority
  #SBATCH --ntasks-per-node 1
  #SBATCH --mem-per-cpu=8G
  #SBATCH --time=24:00:00
  #SBATCH --job-name="psygrid"
  #SBATCH --output=/working_dir/logs/split_grid.out

  srun python /working_dir/split_grid.py

The split grid can be loaded into a ``PSyGrid`` object with the follow line.

.. code-block:: python

  grid = PSyGrid('/working_dir/git-lfs_data_format/grid_%d.h5')