Data management

Note

In this chapter, We refer to the raw input data with data and not to data stored in memory of the computer by Storage instances. With the term preparation we refer to all data processing steps prior to the reconstruction and avoid the ambiguous term processing although it may be more familiar to the reader.

Consider the following generic steps which every ptychographer has to complete prior to a successful image reconstruction.

(A) Conducting a scanning diffraction experiment.

While or after the experiment is performed, the researcher is left with raw images acquired from the detector and meta data which, in general, consists of scanning positions along with geometric information about the setup, e.g. photon energy, propagation distance, detector pixel size etc.

(B) Preparing the data.

In this step, the user performs a subset of the following actions

  • select the appropriate region of the detector where the scattering events were counted,

  • apply possible pixel corrections to convert the detector counts of the chosen diffraction frame into photon counts, e.g. flat-field and dark-field correction,

  • switch image orientation to match with the coordinate system of the reconstruction algorithms,

  • assign a suited mask to exclude invalid pixel data (hot or dead pixel, overexposure),

  • and/or simply rebin the data.

Finally the user needs to zip the diffraction frames together with the scanning positions.

(C) Saving the processed data or feed the data into recontruction process.

In this step the user needs to save the data in a suitable format or provide the data directly for the reconstruction engine.

Data management in PtyPy deals with (B) and (C) as a ptychography reconstruction software naturally cannot provide actual experimental data. Nevertheless, the treatment of raw data is usually very similar for every experiment. Consequently, PtyPy provides an abstract base class, called PtyScan, which aims to help with steps (B) and (C). In order to adapt PtyPy for a specific experimental setup, we simply subclass PtyScan and reimplement only that subset of its methods which are affected by the specifics of the experiemental setup (see Tutorial : Subclassing PtyScan).

The PtyScan class

PtyScan is the abstract base class in PtyPy that manages raw input data.

A PtyScan instance is constructed from a set of generic parameters, see scan.data in the ptypy parameter tree.

It provides the following features:

Parallelization

When PtyPy is run across several MPI processes, PtyScan takes care of distributing the scan-point indices among processes such that each process only loads the data it will later use in the reconstruction. Hence, the load on the network is not affected by the number of processes. The parallel behavior of PtyScan, is controlled by the parameter scan.data.load_parallel. It uses the LoadManager

Preparation

PtyScan can handle a few of the raw processing steps mentioned above.

  • Selection a region-of-interest from the raw detector image. This selection is controlled by the parameters scan.data.auto_center, and scan.data.shape and scan.data.center.

  • Switching of orientation and rebinning are controlled by scan.data.orientation and scan.data.rebin.

  • Finding a suitable mask or weight for pixel correction is left to the user, as this is a setup-specific implementation. See load_weight(), load_common(), load() and correct() for detailed explanations.

Packaging

PtyScan packs the prepared data together with the used scan point indices, scan positions and a weight (=mask) and geometric meta information. This package is requested by the managing instance ModelManager on the call new_data().

Because data acquisition and preparation can happen during a reconstruction process, it is possible to specify the minimum number of data frames passed to each process on a new_data() by setting the value of scan.data.min_frames. The total number of frames processed for a scan is set by scan.data.num_frames.

If not extracted from other files, the user may set the photon energy with scan.data.energy, the propagation distance from sample to detector with scan.data.distance and the detector pixel size with scan.data.psize.

Storage

PtyScan and its subclass are capable of storing the data in an hfd5-compatible [HDF] file format. The data file names have a custom suffix: .ptyd.

A detailed overview of the .ptyd data file tree is given below in the section Ptyd file format

The parameters scan.data.save and scan.data.chunk_format control the way PtyScan saves the processed data.

Note

Although h5py [h5py] supports parallel write, this feature is not used in ptypy. At the moment, all mpi nodes send their prepared data to the master node which writes the date to a file.

Usage scenarios

The PtyScan class of PtyPy provides support for three use cases.

Beamline integreted use.

In this use case, the researcher has integrated PtyPy into the beamline end-station or experimental setup with the help of a custom subclass of PtyScan that we call UserScan. This subclass has its own methods to extract many of the of the generic parameters of scan.data and also defaults for specific custom parameters, for instance file paths or file name patterns (for a detailed introduction on how to subclass PtyScan, see Tutorial : Subclassing PtyScan). Once the experiment is completed, the researcher can initiate a reconstruction directly from raw data with a standard reconstruction script.

../_images/data_case_integrated.png

Fig. 20 Integrated use case of PtyScan.

A custom subclass UserScan serves as a translator between PtyPy’s generic parameters and data types and the raw image data and meta data from the experiment. Typically the experiment has to be completed before a reconstruction is started, but with some effort it is even possible to have the reconstruction start immediately after acquisition of the first frame. As data preparation is blended in with the reconstruction process, the reconstruction holds when new data is prepared. Optionally, the prepared data is saved to a .ptyd file to avoid having to run the preparation steps for subsequent reconstruction runs.

Post preparation use.

In this use case, the experiment is long passed and the researcher has either used custom subclass of PtyScan or any other script that generates a compatible .hdf5 file (see here) to save prepared data of that experiment. Reconstruction is supposed to work when passing the data file path in the parameter tree.

Only the input file path needs to be passed either with source or with dfile when source takes the value 'file'. In that latter case, secondary processing and saving to another file is not supported, while it is allowed in the first case. While the latter case seems infavorable due to the lack of secondary preparation options, it is meant as a user-friendly transition switch from the first reconstruction at the experiment to post-experiment analysis. Only the source parameter needs to be altered in script from <..>.data.source=<recipe> to <..>.data.source='file' while the rest of the parameters are ignored and may remain untouched.

../_images/data_case_prepared.png

Fig. 21 Standard supported use case of PtyScan.

If a structure-compatible (see Ptyd file format) *.hdf5-file is available, PtyPy can be used without customizing a subclass of PtyScan. It will use the shipped subclass PtydScan to read in the (prepared) raw data.

Preparation and reconstruction on-the-fly with data acquisition.

This use case is for even tighter beamline integration and on-the-fly scans. The researcher has mastered a suitable subclass UserScan to prepare data from the setup. Now, the preparation happens in a separate process while image frames are acquired. This process runs a python script where the subclass UserScan prepares the data using the auto() method. The save parameter is set to ‘link’ in order to create a separate file for each data chunk and to avoid write access on the source file. The chunk files are linked back into the main source .ptyd file.

All reconstruction processes may access the prepared data without overhead or notable pauses in the reconstruction. For PtyPy there is no difference if compared to a single source file (a feature of [HDF]).

../_images/data_case_flyscan.png

Fig. 22 On-the-fly or demon-like use case of PtyScan.

A separate process prepares the data chunks and saves them in separate files which are linked back into the source data file. This process may run silently as a ‘’demon’’ in the background. Reconstructions can start immediately and run without delays or pauses due to data preparation.

Ptyd file format

Ptypy uses the python module h5py [h5py] to store and load data in the Hierarchical Data Format [HDF] . HDF resembles very much a directory/file tree of today’s operating systems, while the “files” are (multidimensonial) datasets.

Ptypy stores and loads the (processed) experimental data in a file with extension .ptyd, which is a hdf5-file with a data tree of very simple nature. Comparable to tagged image file formats like .edf or .tiff, the ptyd data file seperates meta information (stored in meta/) from the actual data payload (stored in chunks/). A schematic overview of the data tree is depicted below.

*.ptyd/

      meta/

         [general parameters; optional but very useful]
         version     : str
         num_frames  : int
         label       : str

         [geometric porameters; all optional]
         shape       : int or (int,int)
         energy      : float, optional
         distance    : float, optional
         center      : (float,float) or None, optional
         psize       : float or (float,float), optional
         propagation : "farfield" or "nearfield", optional
         ...

      chunks/

         0/
           data      : array(M,N,N) of float
           indices   : array(M) of int, optional
           positions : array(M ,2) of float
           weights   : same shape as data or empty
         1/
           ...
         2/
           ...
         ...

All parameters of meta/ are a subset of scan.data. Omitting any of these parameters or setting the value of the dataset to 'None' has the same effect.

The first set of parameters

version     : str
num_frames  : int
label       : str

are general (optional) parameters.

  • version is ptypy version this dataset was prepared with (current version is 0.8.1.dev6f61b9d9, see version).

  • label is a custom user label. Choose a unique label to your liking.

  • num_frames indicates how many diffraction image frames are expected in the dataset (see num_frames) It is important to set this parameter when the data acquisition is not finished but the reconstruction has already started. If the dataset is complete, the loading class PtydScan retrieves the total number of frames from the payload chunks/

The next set of optional parameters are

shape       : int or (int,int)
energy      : float
distance    : float
center      : (float,float)
psize       : float or (float,float)
propagation : "farfield" or "nearfield"

which refer to the experimental scanning geometry.

  • shape (see scan.data.shape)

  • energy (see scan.data.energy or scan.geometry.energy)

  • distance (see scan.data.distance)

  • center : (float,float) (see scan.data.center)

  • psize : float or (float,float) (see scan.data.psize)

  • propagation : “farfield” or “nearfield” (see scan.data.propagation)

Finally these parameters will be digested by the geometry module in order to provide a suited propagator.

Note

As you may have already noted, there are three ways to specify the geometry of the experiment.

bla

As walking the data tree and extracting the data from the hdf5 file is a bit cumbersome with h5py, there are a few convenience function in the ptypy.io.h5rw module.

Tutorial : Subclassing PtyScan

Note

This tutorial was generated from the python source [ptypy_root]/tutorial/subclassptyscan.py using ptypy/doc/script2rst.py. You are encouraged to modify the parameters and rerun the tutorial with:

$ python [ptypy_root]/tutorial/subclassptyscan.py

In this tutorial, we learn how to subclass PtyScan to make ptypy work with any experimental setup.

This tutorial can be used as a direct follow-up to Tutorial: Modeling the experiment - Pod, Geometry if section Storing the simulation was completed

Again, the imports first.

>>> import numpy as np
>>> from ptypy.core.data import PtyScan
>>> from ptypy import utils as u

For this tutorial we assume that the data and meta information is in this path:

>>> save_path = '/tmp/ptypy/sim/'

Furthermore, we assume that a file about the experimental geometry is located at

>>> geofilepath = save_path + 'geometry.txt'
>>> print(geofilepath)
/tmp/ptypy/sim/geometry.txt

and has contents of the following form

>>> print(''.join([line for line in open(geofilepath, 'r')]))
distance 1.5000e-01
energy 2.3305e-03
psize 2.4000e-05
shape 256

The scanning positions are in

>>> positionpath = save_path + 'positions.txt'
>>> print(positionpath)
/tmp/ptypy/sim/positions.txt

with a list of positions for vertical and horizontanl movement and the image frame from the “camera”

>>> print(''.join([line for line in open(positionpath, 'r')][:6])+'....')
ccd/diffraction_0000.npy 0.0000e+00 0.0000e+00
ccd/diffraction_0001.npy 0.0000e+00 4.1562e-04
ccd/diffraction_0002.npy 3.9528e-04 1.2844e-04
ccd/diffraction_0003.npy 2.4430e-04 -3.3625e-04
ccd/diffraction_0004.npy -2.4430e-04 -3.3625e-04
ccd/diffraction_0005.npy -3.9528e-04 1.2844e-04
....

Writing a subclass

The simplest subclass of PtyScan would look like this

>>> class NumpyScan(PtyScan):
>>>     """
>>>     A PtyScan subclass to extract data from a numpy array.
>>>     """
>>>
>>>     def __init__(self, pars=None, **kwargs):
>>>         # In init we need to call the parent.
>>>         super(NumpyScan, self).__init__(pars, **kwargs)
>>>

Of course this class does nothing special beyond PtyScan. As it is, the class also cannot be used as a real PtyScan instance because its defaults are not properly managed. For this, Ptypy provides a powerful self-documenting tool call a “descriptor” which can be applied to any new class using a decorator. The tree of all valid ptypy parameters is located at here. To manage the default parameters of our subclass and document its existence, we would need to write

>>> from ptypy import defaults_tree
>>> @defaults_tree.parse_doc('scandata.numpyscan')
>>> class NumpyScan(PtyScan):
>>>     """
>>>     A PtyScan subclass to extract data from a numpy array.
>>>     """
>>>
>>>     def __init__(self, pars=None, **kwargs):
>>>         # In init we need to call the parent.
>>>         super(NumpyScan, self).__init__(pars, **kwargs)
>>>

The decorator extracts information from the docstring of the subclass and parent classes about the expected input parameters. Currently the docstring of NumpyScan does not contain anything special, thus the only parameters registered are those of the parent class, PtyScan:

>>> print(defaults_tree['scandata.numpyscan'].to_string())
[name]
default = PtyScan
help =
type = str

[dfile]
default = None
help = File path where prepared data will be saved in the ``ptyd`` format.
type = file
userlevel = 0

[chunk_format]
default = .chunk%02d
help = Appendix to saved files if save == 'link'
type = str
doc =
userlevel = 2

[save]
default = None
help = Saving mode
type = str
doc = Mode to use to save data to file.
     <newline>
     - ``None``: No saving
     - ``'merge'``: attemts to merge data in single chunk **[not implemented]**
     - ``'append'``: appends each chunk in master \*.ptyd file
     - ``'link'``: appends external links in master \*.ptyd file and stores chunks separately
     <newline>
     in the path given by the link. Links file paths are relative to master file.
userlevel = 1

[auto_center]
default = None
help = Determine if center in data is calculated automatically
type = bool
doc =
     - ``False``, no automatic centering
     - ``None``, only if :py:data:`center` is ``None``
     - ``True``, it will be enforced
userlevel = 0

[load_parallel]
default = data
help = Determines what will be loaded in parallel
type = str
doc = Choose from ``None``, ``'data'``, ``'common'``, ``'all'``
choices = ['data', 'common', 'all']

[rebin]
default = None
help = Rebinning factor
type = int
doc = Rebinning factor for the raw data frames. ``'None'`` or ``1`` both mean *no binning*
userlevel = 1
lowlim = 1
uplim = 32

[orientation]
default = None
help = Data frame orientation
type = int, tuple, list
doc = Choose
     <newline>
     - ``None`` or ``0``: correct orientation
     - ``1``: invert columns (numpy.flip_lr)
     - ``2``: invert rows  (numpy.flip_ud)
     - ``3``: invert columns, invert rows
     - ``4``: transpose (numpy.transpose)
     - ``4+i``: tranpose + other operations from above
     <newline>
     Alternatively, a 3-tuple of booleans may be provided ``(do_transpose,
     do_flipud, do_fliplr)``
choices = [0, 1, 2, 3, 4, 5, 6, 7]
userlevel = 1

[min_frames]
default = 1
help = Minimum number of frames loaded by each node
type = int
doc =
userlevel = 2
lowlim = 1

[positions_theory]
default = None
help = Theoretical positions for this scan
type = ndarray
doc = If provided, experimental positions from :py:class:`PtyScan` subclass will be ignored. If data
     preparation is called from Ptycho instance, the calculated positions from the
     :py:func:`ptypy.core.xy.from_pars` dict will be inserted here
userlevel = 2

[num_frames]
default = None
help = Maximum number of frames to be prepared
type = int
doc = If `positions_theory` are provided, num_frames will be ovverriden with the number of
     positions available
userlevel = 1

[label]
default = None
help = The scan label
type = str
doc = Unique string identifying the scan
userlevel = 1

[experimentID]
default = None
help = Name of the experiment
type = str
doc = If None, a default value will be provided by the recipe. **unused**
userlevel = 2

[version]
default = 0.1
help = TODO: Explain this and decide if it is a user parameter.
type = float
doc =
userlevel = 2

[shape]
default = 256
help = Shape of the region of interest cropped from the raw data.
type = int, tuple
doc = Cropping dimension of the diffraction frame
     Can be None, (dimx, dimy), or dim. In the latter case shape will be (dim, dim).
userlevel = 1

[center]
default = 'fftshift'
help = Center (pixel) of the optical axes in raw data
type = list, tuple, str
doc = If ``None``, this parameter will be set by :py:data:`~.scan.data.auto_center` or elsewhere
userlevel = 1

[psize]
default = 0.000172
help = Detector pixel size
type = float, tuple
doc = Dimensions of the detector pixels (in meters)
userlevel = 0
lowlim = 0

[distance]
default = 7.19
help = Sample to detector distance
type = float
doc = In meters.
userlevel = 0
lowlim = 0

[energy]
default = 7.2
help = Photon energy of the incident radiation in keV
type = float
doc =
userlevel = 0
lowlim = 0

[add_poisson_noise]
default = False
help = Decides whether the scan should have poisson noise or not
type = bool

As you can see, there are already many parameters documented in PtyScan’s class. For each parameter, most important are the type, default value and help string. The decorator does more than collect this information: it also generates from it a class variable called DEFAULT, which stores all defaults:

>>> print(u.verbose.report(NumpyScan.DEFAULT, noheader=True))
* id3V4ANI238G           : ptypy.utils.parameters.Param(20)
  * name                 : PtyScan
  * dfile                : None
  * chunk_format         : .chunk%02d
  * save                 : None
  * auto_center          : None
  * load_parallel        : data
  * rebin                : None
  * orientation          : None
  * min_frames           : 1
  * positions_theory     : None
  * num_frames           : None
  * label                : None
  * experimentID         : None
  * version              : 0.1
  * shape                : 256
  * center               : fftshift
  * psize                : 0.000172
  * distance             : 7.19
  * energy               : 7.2
  * add_poisson_noise    : False

Now we are ready to add functionality to our subclass. A first step of initialisation would be to retrieve the geometric information that we stored in geofilepath and update the input parameters with it.

We write a tiny file parser.

>>> def extract_geo(base_path):
>>>     out = {}
>>>     with open(base_path+'geometry.txt') as f:
>>>         for line in f:
>>>             key, value = line.strip().split()
>>>             out[key] = eval(value)
>>>     return out
>>>

We test it.

>>> print(extract_geo(save_path))
{'distance': 0.15, 'energy': 0.0023305, 'psize': 2.4e-05, 'shape': 256}

That seems to work. We can integrate this parser into the initialisation as we assume that this small access can be done by all MPI nodes without data access problems. Hence, our subclass becomes

>>> @defaults_tree.parse_doc('scandata.numpyscan')
>>> class NumpyScan(PtyScan):
>>>     """
>>>     A PtyScan subclass to extract data from a numpy array.
>>>
>>>     Defaults:
>>>
>>>     [name]
>>>     type = str
>>>     default = numpyscan
>>>     help =
>>>
>>>     [base_path]
>>>     type = str
>>>     default = './'
>>>     help = Base path to extract data files from.
>>>     """
>>>
>>>     def __init__(self, pars=None, **kwargs):
>>>         p = self.DEFAULT.copy(depth=2)
>>>         p.update(pars)
>>>
>>>         with open(p.base_path+'geometry.txt') as f:
>>>             for line in f:
>>>                 key, value = line.strip().split()
>>>                 # we only replace Nones or missing keys
>>>                 if p.get(key) is None:
>>>                     p[key] = eval(value)
>>>
>>>         super(NumpyScan, self).__init__(p, **kwargs)
>>>

We now need a new input parameter called base_path, so we documented it in the docstring after the section header “Defaults:”.

>>> print(defaults_tree['scandata.numpyscan.base_path'])
[base_path]
default = './'
help = Base path to extract data files from.
type = str

As you can see, the first step in __init__ is to build a default parameter structure to ensure that all input parameters are available. The next line updates this structure to overwrite the entries specified by the user.

Good! Next, we need to implement how the class finds out about the positions in the scan. The method load_positions() can be used for this purpose.

>>> print(PtyScan.load_positions.__doc__)

        **Override in subclass for custom implementation**

        *Called in* :py:meth:`initialize`

        Loads all positions for all diffraction patterns in this scan.
        The positions loaded here will be available by all processes
        through the attribute ``self.positions``. If you specify position
        on a per frame basis in :py:meth:`load` , this function has no
        effect.

        If theoretical positions :py:data:`positions_theory` are
        provided in the initial parameter set :py:data:`DEFAULT`,
        specifying positions here has NO effect and will be ignored.

        The purpose of this function is to avoid reloading and parallel
        reads on files that may require intense parsing to retrieve the
        information, e.g. long SPEC log files. If parallel reads or
        log file parsing for each set of frames is not a time critical
        issue of the subclass, reimplementing this function can be ignored
        and it is recommended to only reimplement the :py:meth:`load`
        method.

        If `load_parallel` is set to `all` or common`, this function is
        executed by all nodes, otherwise the master node executes this
        function and broadcasts the results to other nodes.

        Returns
        -------
        positions : ndarray
            A (N,2)-array where *N* is the number of positions.

        Note
        ----
        Be aware that this method sets attribute :py:attr:`num_frames`
        in the following manner.

        * If ``num_frames == None`` : ``num_frames = N``.
        * If ``num_frames < N`` , no effect.
        * If ``num_frames > N`` : ``num_frames = N``.

The parser for the positions file would look like this.

>>> def extract_pos(base_path):
>>>     pos = []
>>>     files = []
>>>     with open(base_path+'positions.txt') as f:
>>>         for line in f:
>>>             fname, y, x = line.strip().split()
>>>             pos.append((eval(y), eval(x)))
>>>             files.append(fname)
>>>     return files, pos
>>>

And the test:

>>> files, pos = extract_pos(save_path)
>>> print(files[:2])
['ccd/diffraction_0000.npy', 'ccd/diffraction_0001.npy']

>>> print(pos[:2])
[(0.0, 0.0), (0.0, 0.00041562)]
>>> @defaults_tree.parse_doc('scandata.numpyscan')
>>> class NumpyScan(PtyScan):
>>>     """
>>>     A PtyScan subclass to extract data from a numpy array.
>>>
>>>     Defaults:
>>>
>>>     [name]
>>>     type = str
>>>     default = numpyscan
>>>     help =
>>>
>>>     [base_path]
>>>     type = str
>>>     default = /tmp/ptypy/sim/
>>>     help = Base path to extract data files from.
>>>     """
>>>
>>>     def __init__(self, pars=None, **kwargs):
>>>         p = self.DEFAULT.copy(depth=2)
>>>         p.update(pars)
>>>
>>>         with open(p.base_path+'geometry.txt') as f:
>>>             for line in f:
>>>                 key, value = line.strip().split()
>>>                 # we only replace Nones or missing keys
>>>                 if p.get(key) is None:
>>>                     p[key] = eval(value)
>>>
>>>         super(NumpyScan, self).__init__(p, **kwargs)
>>>
>>>     def load_positions(self):
>>>         # the base path is now stored in
>>>         base_path = self.info.base_path
>>>         pos = []
>>>         with open(base_path+'positions.txt') as f:
>>>             for line in f:
>>>                 fname, y, x = line.strip().split()
>>>                 pos.append((eval(y), eval(x)))
>>>                 files.append(fname)
>>>         return np.asarray(pos)
>>>

One nice thing about rewriting self.load_positions is that the maximum number of frames will be set and we do not need to manually adapt check()

The last step is to overwrite the actual loading of data. Loading happens (MPI-compatible) in load()

>>> print(PtyScan.load.__doc__)

        **Override in subclass for custom implementation**

        Loads data according to node specific scanpoint indices that have
        been determined by :py:class:`LoadManager` or otherwise.

        Returns
        -------
        raw, positions, weight : dict
            Dictionaries whose keys are the given scan point `indices`
            and whose values are the respective frame / position according
            to the scan point index. `weight` and `positions` may be empty

        Note
        ----
        This is the *most* important method to change when subclassing
        :py:class:`PtyScan`. Most often it suffices to override the constructor
        and this method to create a subclass suited for a specific
        experiment.

Load seems a bit more complex than self.load_positions for its return values. However, we can opt-out of providing weights (masks) and positions, as we have already adapted self.load_positions and there were no bad pixels in the (linear) detector

The final subclass looks like this. We overwrite two defaults from PtyScan:

>>> @defaults_tree.parse_doc('scandata.numpyscan')
>>> class NumpyScan(PtyScan):
>>>     """
>>>     A PtyScan subclass to extract data from a numpy array.
>>>
>>>     Defaults:
>>>
>>>     [name]
>>>     type = str
>>>     default = numpyscan
>>>     help =
>>>
>>>     [base_path]
>>>     type = str
>>>     default = /tmp/ptypy/sim/
>>>     help = Base path to extract data files from.
>>>
>>>     [auto_center]
>>>     default = False
>>>
>>>     [dfile]
>>>     default = /tmp/ptypy/sim/npy.ptyd
>>>     """
>>>
>>>     def __init__(self, pars=None, **kwargs):
>>>         p = self.DEFAULT.copy(depth=2)
>>>         p.update(pars)
>>>
>>>         with open(p.base_path+'geometry.txt') as f:
>>>             for line in f:
>>>                 key, value = line.strip().split()
>>>                 # we only replace Nones or missing keys
>>>                 if p.get(key) is None:
>>>                     p[key] = eval(value)
>>>
>>>         super(NumpyScan, self).__init__(p, **kwargs)
>>>
>>>     def load_positions(self):
>>>         # the base path is now stored in
>>>         base_path = self.info.base_path
>>>         pos = []
>>>         with open(base_path+'positions.txt') as f:
>>>             for line in f:
>>>                 fname, y, x = line.strip().split()
>>>                 pos.append((eval(y), eval(x)))
>>>                 files.append(fname)
>>>         return np.asarray(pos)
>>>
>>>     def load(self, indices):
>>>         raw = {}
>>>         bp = self.info.base_path
>>>         for ii in indices:
>>>             raw[ii] = np.load(bp+'ccd/diffraction_%04d.npy' % ii)
>>>         return raw, {}, {}
>>>

Loading the data

With the subclass we create a scan only using defaults

>>> NPS = NumpyScan()
>>> NPS.initialize()

In order to process the data. We need to call auto() with the chunk size as arguments. It returns a data chunk that we can inspect with ptypy.utils.verbose.report(). The information is concatenated, but the length of iterables or dicts is always indicated in parantheses.

>>> print(u.verbose.report(NPS.auto(80), noheader=True))
* id3V4AQEBE20           : dict(3)
  * common               : ptypy.utils.parameters.Param(8)
    * version            : 0.1
    * num_frames         : 116
    * label              : None
    * shape              : [array = [256 256]]
    * psize              : [array = [0.000172 0.000172]]
    * energy             : 7.2
    * center             : [array = [128. 128.]]
    * distance           : 7.19
  * chunk                : ptypy.utils.parameters.Param(6)
    * indices            : list(80)
      * id2M979S98S8     : 0
      * id2M979S98T8     : 1
      * id2M979S98U8     : 2
      * id2M979S98V8     : 3
      * id2M979S9908     : 4
      * ...              : ....
    * indices_node       : list(80)
      * id2M979S98S8     : 0
      * id2M979S98T8     : 1
      * id2M979S98U8     : 2
      * id2M979S98V8     : 3
      * id2M979S9908     : 4
      * ...              : ....
    * num                : 0
    * data               : dict(80)
      * 0                : [256x256 int32 array]
      * 1                : [256x256 int32 array]
      * 2                : [256x256 int32 array]
      * 3                : [256x256 int32 array]
      * 4                : [256x256 int32 array]
      * 5                : [256x256 int32 array]
      * 6                : [256x256 int32 array]
      * 7                : [256x256 int32 array]
      * 8                : [256x256 int32 array]
      * 9                : [256x256 int32 array]
      * 10               : [256x256 int32 array]
      * 11               : [256x256 int32 array]
      * 12               : [256x256 int32 array]
      * 13               : [256x256 int32 array]
      * 14               : [256x256 int32 array]
      * 15               : [256x256 int32 array]
      * 16               : [256x256 int32 array]
      * 17               : [256x256 int32 array]
      * 18               : [256x256 int32 array]
      * 19               : [256x256 int32 array]
      * 20               : [256x256 int32 array]
      * 21               : [256x256 int32 array]
      * 22               : [256x256 int32 array]
      * 23               : [256x256 int32 array]
      * 24               : [256x256 int32 array]
      * 25               : [256x256 int32 array]
      * 26               : [256x256 int32 array]
      * 27               : [256x256 int32 array]
      * 28               : [256x256 int32 array]
      * 29               : [256x256 int32 array]
      * 30               : [256x256 int32 array]
      * 31               : [256x256 int32 array]
      * 32               : [256x256 int32 array]
      * 33               : [256x256 int32 array]
      * 34               : [256x256 int32 array]
      * 35               : [256x256 int32 array]
      * 36               : [256x256 int32 array]
      * 37               : [256x256 int32 array]
      * 38               : [256x256 int32 array]
      * 39               : [256x256 int32 array]
      * 40               : [256x256 int32 array]
      * 41               : [256x256 int32 array]
      * 42               : [256x256 int32 array]
      * 43               : [256x256 int32 array]
      * 44               : [256x256 int32 array]
      * 45               : [256x256 int32 array]
      * 46               : [256x256 int32 array]
      * 47               : [256x256 int32 array]
      * 48               : [256x256 int32 array]
      * 49               : [256x256 int32 array]
      * 50               : [256x256 int32 array]
      * 51               : [256x256 int32 array]
      * 52               : [256x256 int32 array]
      * 53               : [256x256 int32 array]
      * 54               : [256x256 int32 array]
      * 55               : [256x256 int32 array]
      * 56               : [256x256 int32 array]
      * 57               : [256x256 int32 array]
      * 58               : [256x256 int32 array]
      * 59               : [256x256 int32 array]
      * 60               : [256x256 int32 array]
      * 61               : [256x256 int32 array]
      * 62               : [256x256 int32 array]
      * 63               : [256x256 int32 array]
      * 64               : [256x256 int32 array]
      * 65               : [256x256 int32 array]
      * 66               : [256x256 int32 array]
      * 67               : [256x256 int32 array]
      * 68               : [256x256 int32 array]
      * 69               : [256x256 int32 array]
      * 70               : [256x256 int32 array]
      * 71               : [256x256 int32 array]
      * 72               : [256x256 int32 array]
      * 73               : [256x256 int32 array]
      * 74               : [256x256 int32 array]
      * 75               : [256x256 int32 array]
      * 76               : [256x256 int32 array]
      * 77               : [256x256 int32 array]
      * 78               : [256x256 int32 array]
      * 79               : [256x256 int32 array]
    * weights            : dict(80)
      * 0                : [256x256 bool array]
      * 1                : [256x256 bool array]
      * 2                : [256x256 bool array]
      * 3                : [256x256 bool array]
      * 4                : [256x256 bool array]
      * 5                : [256x256 bool array]
      * 6                : [256x256 bool array]
      * 7                : [256x256 bool array]
      * 8                : [256x256 bool array]
      * 9                : [256x256 bool array]
      * 10               : [256x256 bool array]
      * 11               : [256x256 bool array]
      * 12               : [256x256 bool array]
      * 13               : [256x256 bool array]
      * 14               : [256x256 bool array]
      * 15               : [256x256 bool array]
      * 16               : [256x256 bool array]
      * 17               : [256x256 bool array]
      * 18               : [256x256 bool array]
      * 19               : [256x256 bool array]
      * 20               : [256x256 bool array]
      * 21               : [256x256 bool array]
      * 22               : [256x256 bool array]
      * 23               : [256x256 bool array]
      * 24               : [256x256 bool array]
      * 25               : [256x256 bool array]
      * 26               : [256x256 bool array]
      * 27               : [256x256 bool array]
      * 28               : [256x256 bool array]
      * 29               : [256x256 bool array]
      * 30               : [256x256 bool array]
      * 31               : [256x256 bool array]
      * 32               : [256x256 bool array]
      * 33               : [256x256 bool array]
      * 34               : [256x256 bool array]
      * 35               : [256x256 bool array]
      * 36               : [256x256 bool array]
      * 37               : [256x256 bool array]
      * 38               : [256x256 bool array]
      * 39               : [256x256 bool array]
      * 40               : [256x256 bool array]
      * 41               : [256x256 bool array]
      * 42               : [256x256 bool array]
      * 43               : [256x256 bool array]
      * 44               : [256x256 bool array]
      * 45               : [256x256 bool array]
      * 46               : [256x256 bool array]
      * 47               : [256x256 bool array]
      * 48               : [256x256 bool array]
      * 49               : [256x256 bool array]
      * 50               : [256x256 bool array]
      * 51               : [256x256 bool array]
      * 52               : [256x256 bool array]
      * 53               : [256x256 bool array]
      * 54               : [256x256 bool array]
      * 55               : [256x256 bool array]
      * 56               : [256x256 bool array]
      * 57               : [256x256 bool array]
      * 58               : [256x256 bool array]
      * 59               : [256x256 bool array]
      * 60               : [256x256 bool array]
      * 61               : [256x256 bool array]
      * 62               : [256x256 bool array]
      * 63               : [256x256 bool array]
      * 64               : [256x256 bool array]
      * 65               : [256x256 bool array]
      * 66               : [256x256 bool array]
      * 67               : [256x256 bool array]
      * 68               : [256x256 bool array]
      * 69               : [256x256 bool array]
      * 70               : [256x256 bool array]
      * 71               : [256x256 bool array]
      * 72               : [256x256 bool array]
      * 73               : [256x256 bool array]
      * 74               : [256x256 bool array]
      * 75               : [256x256 bool array]
      * 76               : [256x256 bool array]
      * 77               : [256x256 bool array]
      * 78               : [256x256 bool array]
      * 79               : [256x256 bool array]
    * positions          : [80x2 float64 array]
  * iterable             : list(80)
    * id3V4AQCNFO0       : dict(4)
      * index            : 0
      * data             : [256x256 int32 array]
      * position         : [array = [0. 0.]]
      * mask             : [256x256 bool array]
    * id3V4ANG90M0       : dict(4)
      * index            : 1
      * data             : [256x256 int32 array]
      * position         : [array = [0.         0.00041562]]
      * mask             : [256x256 bool array]
    * id3V4ANH7UC0       : dict(4)
      * index            : 2
      * data             : [256x256 int32 array]
      * position         : [array = [0.00039528 0.00012844]]
      * mask             : [256x256 bool array]
    * id3V4ANI69A0       : dict(4)
      * index            : 3
      * data             : [256x256 int32 array]
      * position         : [array = [ 0.0002443  -0.00033625]]
      * mask             : [256x256 bool array]
    * id3V4ANI6B60       : dict(4)
      * index            : 4
      * data             : [256x256 int32 array]
      * position         : [array = [-0.0002443  -0.00033625]]
      * mask             : [256x256 bool array]
    * ...                : ....


>>> print(u.verbose.report(NPS.auto(80), noheader=True))
* id3V4ANI6C80           : dict(3)
  * common               : ptypy.utils.parameters.Param(8)
    * version            : 0.1
    * num_frames         : 116
    * label              : None
    * shape              : [array = [256 256]]
    * psize              : [array = [0.000172 0.000172]]
    * energy             : 7.2
    * center             : [array = [128. 128.]]
    * distance           : 7.19
  * chunk                : ptypy.utils.parameters.Param(6)
    * indices            : list(36)
      * id2M979S9BC8     : 80
      * id2M979S9BD8     : 81
      * id2M979S9BE8     : 82
      * id2M979S9BF8     : 83
      * id2M979S9BG8     : 84
      * ...              : ....
    * indices_node       : list(36)
      * id2M979S9BC8     : 80
      * id2M979S9BD8     : 81
      * id2M979S9BE8     : 82
      * id2M979S9BF8     : 83
      * id2M979S9BG8     : 84
      * ...              : ....
    * num                : 1
    * data               : dict(36)
      * 80               : [256x256 int32 array]
      * 81               : [256x256 int32 array]
      * 82               : [256x256 int32 array]
      * 83               : [256x256 int32 array]
      * 84               : [256x256 int32 array]
      * 85               : [256x256 int32 array]
      * 86               : [256x256 int32 array]
      * 87               : [256x256 int32 array]
      * 88               : [256x256 int32 array]
      * 89               : [256x256 int32 array]
      * 90               : [256x256 int32 array]
      * 91               : [256x256 int32 array]
      * 92               : [256x256 int32 array]
      * 93               : [256x256 int32 array]
      * 94               : [256x256 int32 array]
      * 95               : [256x256 int32 array]
      * 96               : [256x256 int32 array]
      * 97               : [256x256 int32 array]
      * 98               : [256x256 int32 array]
      * 99               : [256x256 int32 array]
      * 100              : [256x256 int32 array]
      * 101              : [256x256 int32 array]
      * 102              : [256x256 int32 array]
      * 103              : [256x256 int32 array]
      * 104              : [256x256 int32 array]
      * 105              : [256x256 int32 array]
      * 106              : [256x256 int32 array]
      * 107              : [256x256 int32 array]
      * 108              : [256x256 int32 array]
      * 109              : [256x256 int32 array]
      * 110              : [256x256 int32 array]
      * 111              : [256x256 int32 array]
      * 112              : [256x256 int32 array]
      * 113              : [256x256 int32 array]
      * 114              : [256x256 int32 array]
      * 115              : [256x256 int32 array]
    * weights            : dict(36)
      * 80               : [256x256 bool array]
      * 81               : [256x256 bool array]
      * 82               : [256x256 bool array]
      * 83               : [256x256 bool array]
      * 84               : [256x256 bool array]
      * 85               : [256x256 bool array]
      * 86               : [256x256 bool array]
      * 87               : [256x256 bool array]
      * 88               : [256x256 bool array]
      * 89               : [256x256 bool array]
      * 90               : [256x256 bool array]
      * 91               : [256x256 bool array]
      * 92               : [256x256 bool array]
      * 93               : [256x256 bool array]
      * 94               : [256x256 bool array]
      * 95               : [256x256 bool array]
      * 96               : [256x256 bool array]
      * 97               : [256x256 bool array]
      * 98               : [256x256 bool array]
      * 99               : [256x256 bool array]
      * 100              : [256x256 bool array]
      * 101              : [256x256 bool array]
      * 102              : [256x256 bool array]
      * 103              : [256x256 bool array]
      * 104              : [256x256 bool array]
      * 105              : [256x256 bool array]
      * 106              : [256x256 bool array]
      * 107              : [256x256 bool array]
      * 108              : [256x256 bool array]
      * 109              : [256x256 bool array]
      * 110              : [256x256 bool array]
      * 111              : [256x256 bool array]
      * 112              : [256x256 bool array]
      * 113              : [256x256 bool array]
      * 114              : [256x256 bool array]
      * 115              : [256x256 bool array]
    * positions          : [36x2 float64 array]
  * iterable             : list(36)
    * id3V4ANI6BQ0       : dict(4)
      * index            : 80
      * data             : [256x256 int32 array]
      * position         : [array = [0.0018532 0.0016686]]
      * mask             : [256x256 bool array]
    * id3V4ANGAKA0       : dict(4)
      * index            : 81
      * data             : [256x256 int32 array]
      * position         : [array = [0.0021597 0.0012469]]
      * mask             : [256x256 bool array]
    * id3V4ANI6D60       : dict(4)
      * index            : 82
      * data             : [256x256 int32 array]
      * position         : [array = [0.0023717  0.00077061]]
      * mask             : [256x256 bool array]
    * id3V4AQEBFK0       : dict(4)
      * index            : 83
      * data             : [256x256 int32 array]
      * position         : [array = [0.0024801  0.00026067]]
      * mask             : [256x256 bool array]
    * id3V4ANG7O20       : dict(4)
      * index            : 84
      * data             : [256x256 int32 array]
      * position         : [array = [ 0.0024801  -0.00026067]]
      * mask             : [256x256 bool array]
    * ...                : ....

We observe the second chunk was not 80 frames deep but 34 as we only had 114 frames of data.

So where is the .ptyd data-file? As default, PtyScan does not actually save data. We have to manually activate it in in the input paramaters.

>>> data = NPS.DEFAULT.copy(depth=2)
>>> data.save = 'append'
>>> NPS = NumpyScan(pars=data)
>>> NPS.initialize()
>>> for i in range(50):
>>>     msg = NPS.auto(20)
>>>     if msg == NPS.EOS:
>>>         break
>>>

We can analyse the saved npy.ptyd with h5info()

>>> from ptypy.io import h5info
>>> print(h5info(NPS.info.dfile))
File created : Mon Mar 11 09:53:29 2024
 * chunks [dict 6]:
     * 0 [dict 4]:
         * data [20x256x256 int32 array]
         * indices [list = [0.000000, 1.000000, 2.000000, 3.000000,  ...]]
         * positions [20x2 float64 array]
         * weights [20x256x256 bool array]
     * 1 [dict 4]:
         * data [20x256x256 int32 array]
         * indices [list = [20.000000, 21.000000, 22.000000, 23.000000,  ...]]
         * positions [20x2 float64 array]
         * weights [20x256x256 bool array]
     * 2 [dict 4]:
         * data [20x256x256 int32 array]
         * indices [list = [40.000000, 41.000000, 42.000000, 43.000000,  ...]]
         * positions [20x2 float64 array]
         * weights [20x256x256 bool array]
     * 3 [dict 4]:
         * data [20x256x256 int32 array]
         * indices [list = [60.000000, 61.000000, 62.000000, 63.000000,  ...]]
         * positions [20x2 float64 array]
         * weights [20x256x256 bool array]
     * 4 [dict 4]:
         * data [20x256x256 int32 array]
         * indices [list = [80.000000, 81.000000, 82.000000, 83.000000,  ...]]
         * positions [20x2 float64 array]
         * weights [20x256x256 bool array]
     * 5 [dict 4]:
         * data [16x256x256 int32 array]
         * indices [list = [100.000000, 101.000000, 102.000000, 103.000000,  ...]]
         * positions [16x2 float64 array]
         * weights [16x256x256 bool array]
 * info [dict 23]:
     * add_poisson_noise [scalar = False]
     * auto_center [scalar = False]
     * base_path [string = "b'/tmp/ptypy/sim/'"]
     * center [array = [128. 128.]]
     * chunk_format [string = "b'.chunk%02d'"]
     * dfile [string = "b'/tmp/ptypy/sim/npy.ptyd'"]
     * distance [scalar = 7.19]
     * energy [scalar = 7.2]
     * experimentID [None]
     * label [None]
     * load_parallel [string = "b'data'"]
     * min_frames [scalar = 1]
     * name [string = "b'numpyscan'"]
     * num_frames [None]
     * orientation [None]
     * positions_scan [116x2 float64 array]
     * positions_theory [None]
     * psize [scalar = 0.000172]
     * rebin [scalar = 1]
     * save [string = "b'append'"]
     * shape [array = [256 256]]
     * version [scalar = 0.1]
     * weight2d [scalar = True]
 * meta [dict 8]:
     * center [array = [128. 128.]]
     * distance [scalar = 7.19]
     * energy [scalar = 7.2]
     * label [None]
     * num_frames [scalar = 116]
     * psize [array = [0.000172 0.000172]]
     * shape [array = [256 256]]
     * version [scalar = 0.1]

None

Listing the new subclass

In order to make the subclass available in your local PtyPy, navigate to [ptypy_root]/ptypy/experiment and paste the content into a new file user.py:

$ touch [ptypy_root]/ptypy/experiment/user.py

Append the following lines into [ptypy_root]/ptypy/experiment.__init__.py:

from user import NumpyScan
PtyScanTypes.update({'numpy':NumpyScan})

Now, your new subclass will be used whenever you pass 'numpy' for the scan.data.source parameter. All special parameters of the class should be passed via the dict scan.data.recipe.

h5py(1,2)

http://www.h5py.org/

HDF(1,2,3)

Hierarchical Data Format, http://www.hdfgroup.org/HDF5/