.. _ptypy_data: *************** Data management *************** .. note:: In this chapter, We refer to the *raw input data* with *data* and not to data stored in memory of the computer by :any:`Storage` instances. With the term *preparation* we refer to all data processing steps prior to the reconstruction and avoid the ambiguous term *processing* although it may be more familiar to the reader. Consider the following generic steps which every ptychographer has to complete prior to a successful image reconstruction. **(A)** *Conducting a scanning diffraction experiment.* While or after the experiment is performed, the researcher is left with *raw images* acquired from the detector and *meta data* which, in general, consists of scanning positions along with geometric information about the setup, e.g. photon *energy*, propagation *distance*, *detector pixel size* etc. **(B)** *Preparing the data.* In this step, the user performs a subset of the following actions * select the appropriate region of the detector where the scattering events were counted, * apply possible *pixel corrections* to convert the detector counts of the chosen diffraction frame into photon counts, e.g. flat-field and dark-field correction, * switch image orientation to match with the coordinate system of the reconstruction algorithms, * assign a suited mask to exclude invalid pixel data (hot or dead pixel, overexposure), * and/or simply rebin the data. Finally the user needs to zip the diffraction frames together with the scanning positions. **(C)** *Saving the processed data or feed the data into recontruction process.* In this step the user needs to save the data in a suitable format or provide the data directly for the reconstruction engine. **Data management** in |ptypy| deals with **(B)** and **(C)** as a ptychography reconstruction software naturally **cannot** provide actual experimental data. Nevertheless, the treatment of raw data is usually very similar for every experiment. Consequently, |ptypy| provides an abstract base class, called :py:class:`PtyScan`, which aims to help with steps (B) and (C). In order to adapt |ptypy| for a specific experimental setup, we simply subclass :py:class:`PtyScan` and reimplement only that subset of its methods which are affected by the specifics of the experiemental setup (see :ref:`subclassptyscan`). .. _sec_ptyscan: The PtyScan class ================= :py:class:`PtyScan` is the abstract base class in |ptypy| that manages raw input data. A PtyScan instance is constructed from a set of generic parameters, see :py:data:`.scan.data` in the ptypy parameter tree. It provides the following features: **Parallelization** When |ptypy| is run across several MPI processes, PtyScan takes care of distributing the scan-point indices among processes such that each process only loads the data it will later use in the reconstruction. Hence, the load on the network is not affected by the number of processes. The parallel behavior of :py:class:`PtyScan`, is controlled by the parameter :py:data:`.scan.data.load_parallel`. It uses the :py:class:`~ptypy.utils.parallel.LoadManager` **Preparation** PtyScan can handle a few of the raw processing steps mentioned above. * Selection a region-of-interest from the raw detector image. This selection is controlled by the parameters :py:data:`.scan.data.auto_center`, and :py:data:`.scan.data.shape` and :py:data:`.scan.data.center`. * Switching of orientation and rebinning are controlled by :py:data:`.scan.data.orientation` and :py:data:`.scan.data.rebin`. * Finding a suitable mask or weight for pixel correction is left to the user, as this is a setup-specific implementation. See :py:meth:`~ptypy.core.data.PtyScan.load_weight`, :py:meth:`~ptypy.core.data.PtyScan.load_common`, :py:meth:`~ptypy.core.data.PtyScan.load` and :py:meth:`~ptypy.core.data.PtyScan.correct` for detailed explanations. **Packaging** PtyScan packs the prepared *data* together with the used scan point *indices*, scan *positions* and a *weight* (=mask) and geometric *meta* information. This package is requested by the managing instance :py:class:`~ptypy.core.manager.ModelManager` on the call :py:meth:`~ptypy.core.manager.ModelManager.new_data`. Because data acquisition and preparation can happen during a reconstruction process, it is possible to specify the minimum number of data frames passed to each process on a *new_data()* by setting the value of :py:data:`.scan.data.min_frames`. The total number of frames processed for a scan is set by :py:data:`.scan.data.num_frames`. If not extracted from other files, the user may set the photon energy with :py:data:`.scan.data.energy`, the propagation distance from sample to detector with :py:data:`.scan.data.distance` and the detector pixel size with :py:data:`.scan.data.psize`. **Storage** PtyScan and its subclass are capable of storing the data in an *hfd5*-compatible [HDF]_ file format. The data file names have a custom suffix: ``.ptyd``. A detailed overview of the *.ptyd* data file tree is given below in the section :ref:`ptyd_file` The parameters :py:data:`.scan.data.save` and :py:data:`.scan.data.chunk_format` control the way PtyScan saves the processed data. .. note:: Although *h5py* [h5py]_ supports parallel write, this feature is not used in ptypy. At the moment, all mpi nodes send their prepared data to the master node which writes the date to a file. .. _ptyd_scenarios: Usage scenarios =============== The PtyScan class of |ptypy| provides support for three use cases. **Beamline integreted use.** In this use case, the researcher has integrated |ptypy| into the beamline end-station or experimental setup with the help of a custom subclass of :py:class:`PtyScan` that we call ``UserScan``. This subclass has its own methods to extract many of the of the generic parameters of :py:data:`.scan.data` and also defaults for specific custom parameters, for instance file paths or file name patterns (for a detailed introduction on how to subclass PtyScan, see :ref:`subclassptyscan`). Once the experiment is completed, the researcher can initiate a reconstruction directly from raw data with a standard reconstruction script. .. figure:: ../img/data_case_integrated.png :width: 70 % :figclass: highlights :name: case_integrated Integrated use case of :py:class:`PtyScan`. A custom subclass ``UserScan`` serves as a translator between |ptypy|'s generic parameters and data types and the raw image data and meta data from the experiment. Typically the experiment has to be completed before a reconstruction is started, but with some effort it is even possible to have the reconstruction start immediately after acquisition of the first frame. As data preparation is blended in with the reconstruction process, the reconstruction holds when new data is prepared. Optionally, the prepared data is saved to a ``.ptyd`` file to avoid having to run the preparation steps for subsequent reconstruction runs. **Post preparation use.** In this use case, the experiment is long passed and the researcher has either used custom subclass of PtyScan or *any other script* that generates a compatible .hdf5 file (see :ref:`here`) to save prepared data of that experiment. Reconstruction is supposed to work when passing the data file path in the parameter tree. Only the input file path needs to be passed either with :py:data:`~.scan.data.source` or with :py:data:`~.scan.data.dfile` when :py:data:`~.scan.data.source` takes the value ``'file'``. In that latter case, secondary processing and saving to another file is not supported, while it is allowed in the first case. While the latter case seems infavorable due to the lack of secondary preparation options, it is meant as a user-friendly transition switch from the first reconstruction at the experiment to post-experiment analysis. Only the :py:data:`~.scan.data.source` parameter needs to be altered in script from ``<..>.data.source=`` to ``<..>.data.source='file'`` while the rest of the parameters are ignored and may remain untouched. .. figure:: ../img/data_case_prepared.png :width: 70 % :figclass: highlights :name: case_prepared Standard supported use case of :py:class:`PtyScan`. If a structure-compatible (see :ref:`ptyd_file`) ``*.hdf5``-file is available, |ptypy| can be used without customizing a subclass of :py:class:`PtyScan`. It will use the shipped subclass :py:class:`PtydScan` to read in the (prepared) raw data. **Preparation and reconstruction on-the-fly with data acquisition.** This use case is for even tighter beamline integration and on-the-fly scans. The researcher has mastered a suitable subclass ``UserScan`` to prepare data from the setup. Now, the preparation happens in a separate process while image frames are acquired. This process runs a python script where the subclass ``UserScan`` prepares the data using the :py:meth:`~ptypy.core.data.PtyScan.auto` method. The :py:data:`~.scan.data.save` parameter is set to 'link' in order to create a separate file for each data chunk and to avoid write access on the source file. The chunk files are linked back into the main source ``.ptyd`` file. All reconstruction processes may access the prepared data without overhead or notable pauses in the reconstruction. For |ptypy| there is no difference if compared to a single source file (a feature of [HDF]_\ ). .. figure:: ../img/data_case_flyscan.png :width: 70 % :figclass: highlights :name: case_flyscan On-the-fly or demon-like use case of :py:class:`PtyScan`. A separate process prepares the data *chunks* and saves them in separate files which are linked back into the source data file. This process may run silently as a ''demon'' in the background. Reconstructions can start immediately and run without delays or pauses due to data preparation. .. _ptyd_file: Ptyd file format ================ Ptypy uses the python module **h5py** [h5py]_ to store and load data in the **H**\ ierarchical **D**\ ata **F**\ ormat [HDF]_ . HDF resembles very much a directory/file tree of today's operating systems, while the "files" are (multidimensonial) datasets. Ptypy stores and loads the (processed) experimental data in a file with extension *.ptyd*, which is a hdf5-file with a data tree of very simple nature. Comparable to tagged image file formats like *.edf* or *.tiff*, the ``ptyd`` data file seperates meta information (stored in ``meta/``) from the actual data payload (stored in ``chunks/``). A schematic overview of the data tree is depicted below. :: *.ptyd/ meta/ [general parameters; optional but very useful] version : str num_frames : int label : str [geometric porameters; all optional] shape : int or (int,int) energy : float, optional distance : float, optional center : (float,float) or None, optional psize : float or (float,float), optional propagation : "farfield" or "nearfield", optional ... chunks/ 0/ data : array(M,N,N) of float indices : array(M) of int, optional positions : array(M ,2) of float weights : same shape as data or empty 1/ ... 2/ ... ... All parameters of ``meta/`` are a subset of :py:data:`.scan.data`\ . Omitting any of these parameters or setting the value of the dataset to ``'None'`` has the same effect. The first set of parameters :: version : str num_frames : int label : str are general (optional) parameters. * ``version`` is ptypy version this dataset was prepared with (current version is |version|, see :py:data:`~.scan.data.version`). * ``label`` is a custom user label. Choose a unique label to your liking. * ``num_frames`` indicates how many diffraction image frames are expected in the dataset (see :py:data:`~.scan.data.num_frames`) It is important to set this parameter when the data acquisition is not finished but the reconstruction has already started. If the dataset is complete, the loading class :py:class:`PtydScan` retrieves the total number of frames from the payload ``chunks/`` The next set of optional parameters are :: shape : int or (int,int) energy : float distance : float center : (float,float) psize : float or (float,float) propagation : "farfield" or "nearfield" which refer to the experimental scanning geometry. * ``shape`` (see :py:data:`.scan.data.shape`) * ``energy`` (see :py:data:`.scan.data.energy` or :py:data:`.scan.geometry.energy`) * ``distance`` (see :py:data:`.scan.data.distance`) * ``center`` : (float,float) (see :py:data:`.scan.data.center`) * ``psize`` : float or (float,float) (see :py:data:`.scan.data.psize`) * ``propagation`` : "farfield" or "nearfield" (see :py:data:`.scan.data.propagation`) Finally these parameters will be digested by the :py:mod:`~ptypy.core.geometry` module in order to provide a suited propagator. .. note:: As you may have already noted, there are three ways to specify the geometry of the experiment. :: bla As walking the data tree and extracting the data from the *hdf5* file is a bit cumbersome with h5py, there are a few convenience function in the :py:mod:`ptypy.io.h5rw` module. .. _subclassptyscan: Tutorial : Subclassing PtyScan ============================== .. note:: This tutorial was generated from the python source :file:`[ptypy_root]/tutorial/subclassptyscan.py` using :file:`ptypy/doc/script2rst.py`. You are encouraged to modify the parameters and rerun the tutorial with:: $ python [ptypy_root]/tutorial/subclassptyscan.py In this tutorial, we learn how to subclass :py:class:`PtyScan` to make ptypy work with any experimental setup. This tutorial can be used as a direct follow-up to :ref:`simupod` if section :ref:`store` was completed Again, the imports first. :: >>> import numpy as np >>> from ptypy.core.data import PtyScan >>> from ptypy import utils as u For this tutorial we assume that the data and meta information is in this path: :: >>> save_path = '/tmp/ptypy/sim/' Furthermore, we assume that a file about the experimental geometry is located at :: >>> geofilepath = save_path + 'geometry.txt' >>> print(geofilepath) /tmp/ptypy/sim/geometry.txt and has contents of the following form :: >>> print(''.join([line for line in open(geofilepath, 'r')])) distance 1.5000e-01 energy 2.3305e-03 psize 2.4000e-05 shape 256 The scanning positions are in :: >>> positionpath = save_path + 'positions.txt' >>> print(positionpath) /tmp/ptypy/sim/positions.txt with a list of positions for vertical and horizontanl movement and the image frame from the "camera" :: >>> print(''.join([line for line in open(positionpath, 'r')][:6])+'....') ccd/diffraction_0000.npy 0.0000e+00 0.0000e+00 ccd/diffraction_0001.npy 0.0000e+00 4.1562e-04 ccd/diffraction_0002.npy 3.9528e-04 1.2844e-04 ccd/diffraction_0003.npy 2.4430e-04 -3.3625e-04 ccd/diffraction_0004.npy -2.4430e-04 -3.3625e-04 ccd/diffraction_0005.npy -3.9528e-04 1.2844e-04 .... Writing a subclass ------------------ The simplest subclass of PtyScan would look like this :: >>> class NumpyScan(PtyScan): >>> """ >>> A PtyScan subclass to extract data from a numpy array. >>> """ >>> >>> def __init__(self, pars=None, **kwargs): >>> # In init we need to call the parent. >>> super(NumpyScan, self).__init__(pars, **kwargs) >>> Of course this class does nothing special beyond PtyScan. As it is, the class also cannot be used as a real PtyScan instance because its defaults are not properly managed. For this, Ptypy provides a powerful self-documenting tool call a "descriptor" which can be applied to any new class using a decorator. The tree of all valid ptypy parameters is located at :ref:`here `. To manage the default parameters of our subclass and document its existence, we would need to write :: >>> from ptypy import defaults_tree :: >>> @defaults_tree.parse_doc('scandata.numpyscan') >>> class NumpyScan(PtyScan): >>> """ >>> A PtyScan subclass to extract data from a numpy array. >>> """ >>> >>> def __init__(self, pars=None, **kwargs): >>> # In init we need to call the parent. >>> super(NumpyScan, self).__init__(pars, **kwargs) >>> The decorator extracts information from the docstring of the subclass and parent classes about the expected input parameters. Currently the docstring of `NumpyScan` does not contain anything special, thus the only parameters registered are those of the parent class, `PtyScan`: :: >>> print(defaults_tree['scandata.numpyscan'].to_string()) [name] default = PtyScan help = type = str [dfile] default = None help = File path where prepared data will be saved in the ``ptyd`` format. type = file userlevel = 0 [chunk_format] default = .chunk%02d help = Appendix to saved files if save == 'link' type = str doc = userlevel = 2 [save] default = None help = Saving mode type = str doc = Mode to use to save data to file. - ``None``: No saving - ``'merge'``: attemts to merge data in single chunk **[not implemented]** - ``'append'``: appends each chunk in master \*.ptyd file - ``'link'``: appends external links in master \*.ptyd file and stores chunks separately in the path given by the link. Links file paths are relative to master file. userlevel = 1 [auto_center] default = None help = Determine if center in data is calculated automatically type = bool doc = - ``False``, no automatic centering - ``None``, only if :py:data:`center` is ``None`` - ``True``, it will be enforced userlevel = 0 [load_parallel] default = data help = Determines what will be loaded in parallel type = str doc = Choose from ``None``, ``'data'``, ``'common'``, ``'all'`` choices = ['data', 'common', 'all'] [rebin] default = None help = Rebinning factor type = int doc = Rebinning factor for the raw data frames. ``'None'`` or ``1`` both mean *no binning* userlevel = 1 lowlim = 1 uplim = 32 [orientation] default = None help = Data frame orientation type = int, tuple, list doc = Choose - ``None`` or ``0``: correct orientation - ``1``: invert columns (numpy.flip_lr) - ``2``: invert rows (numpy.flip_ud) - ``3``: invert columns, invert rows - ``4``: transpose (numpy.transpose) - ``4+i``: tranpose + other operations from above Alternatively, a 3-tuple of booleans may be provided ``(do_transpose, do_flipud, do_fliplr)`` choices = [0, 1, 2, 3, 4, 5, 6, 7] userlevel = 1 [min_frames] default = 1 help = Minimum number of frames loaded by each node type = int doc = userlevel = 2 lowlim = 1 [positions_theory] default = None help = Theoretical positions for this scan type = ndarray doc = If provided, experimental positions from :py:class:`PtyScan` subclass will be ignored. If data preparation is called from Ptycho instance, the calculated positions from the :py:func:`ptypy.core.xy.from_pars` dict will be inserted here userlevel = 2 [num_frames] default = None help = Maximum number of frames to be prepared type = int doc = If `positions_theory` are provided, num_frames will be ovverriden with the number of positions available userlevel = 1 [label] default = None help = The scan label type = str doc = Unique string identifying the scan userlevel = 1 [experimentID] default = None help = Name of the experiment type = str doc = If None, a default value will be provided by the recipe. **unused** userlevel = 2 [version] default = 0.1 help = TODO: Explain this and decide if it is a user parameter. type = float doc = userlevel = 2 [shape] default = 256 help = Shape of the region of interest cropped from the raw data. type = int, tuple doc = Cropping dimension of the diffraction frame Can be None, (dimx, dimy), or dim. In the latter case shape will be (dim, dim). userlevel = 1 [center] default = 'fftshift' help = Center (pixel) of the optical axes in raw data type = list, tuple, str doc = If ``None``, this parameter will be set by :py:data:`~.scan.data.auto_center` or elsewhere userlevel = 1 [psize] default = 0.000172 help = Detector pixel size type = float, tuple doc = Dimensions of the detector pixels (in meters) userlevel = 0 lowlim = 0 [distance] default = 7.19 help = Sample to detector distance type = float doc = In meters. userlevel = 0 lowlim = 0 [energy] default = 7.2 help = Photon energy of the incident radiation in keV type = float doc = userlevel = 0 lowlim = 0 [add_poisson_noise] default = False help = Decides whether the scan should have poisson noise or not type = bool As you can see, there are already many parameters documented in `PtyScan`'s class. For each parameter, most important are the *type*, *default* value and *help* string. The decorator does more than collect this information: it also generates from it a class variable called `DEFAULT`, which stores all defaults: :: >>> print(u.verbose.report(NumpyScan.DEFAULT, noheader=True)) * id3V4ANI238G : ptypy.utils.parameters.Param(20) * name : PtyScan * dfile : None * chunk_format : .chunk%02d * save : None * auto_center : None * load_parallel : data * rebin : None * orientation : None * min_frames : 1 * positions_theory : None * num_frames : None * label : None * experimentID : None * version : 0.1 * shape : 256 * center : fftshift * psize : 0.000172 * distance : 7.19 * energy : 7.2 * add_poisson_noise : False Now we are ready to add functionality to our subclass. A first step of initialisation would be to retrieve the geometric information that we stored in ``geofilepath`` and update the input parameters with it. We write a tiny file parser. :: >>> def extract_geo(base_path): >>> out = {} >>> with open(base_path+'geometry.txt') as f: >>> for line in f: >>> key, value = line.strip().split() >>> out[key] = eval(value) >>> return out >>> We test it. :: >>> print(extract_geo(save_path)) {'distance': 0.15, 'energy': 0.0023305, 'psize': 2.4e-05, 'shape': 256} That seems to work. We can integrate this parser into the initialisation as we assume that this small access can be done by all MPI nodes without data access problems. Hence, our subclass becomes :: >>> @defaults_tree.parse_doc('scandata.numpyscan') >>> class NumpyScan(PtyScan): >>> """ >>> A PtyScan subclass to extract data from a numpy array. >>> >>> Defaults: >>> >>> [name] >>> type = str >>> default = numpyscan >>> help = >>> >>> [base_path] >>> type = str >>> default = './' >>> help = Base path to extract data files from. >>> """ >>> >>> def __init__(self, pars=None, **kwargs): >>> p = self.DEFAULT.copy(depth=2) >>> p.update(pars) >>> >>> with open(p.base_path+'geometry.txt') as f: >>> for line in f: >>> key, value = line.strip().split() >>> # we only replace Nones or missing keys >>> if p.get(key) is None: >>> p[key] = eval(value) >>> >>> super(NumpyScan, self).__init__(p, **kwargs) >>> We now need a new input parameter called `base_path`, so we documented it in the docstring after the section header "Defaults:". :: >>> print(defaults_tree['scandata.numpyscan.base_path']) [base_path] default = './' help = Base path to extract data files from. type = str As you can see, the first step in `__init__` is to build a default parameter structure to ensure that all input parameters are available. The next line updates this structure to overwrite the entries specified by the user. Good! Next, we need to implement how the class finds out about the positions in the scan. The method :py:meth:`~ptypy.core.data.PtyScan.load_positions` can be used for this purpose. :: >>> print(PtyScan.load_positions.__doc__) **Override in subclass for custom implementation** *Called in* :py:meth:`initialize` Loads all positions for all diffraction patterns in this scan. The positions loaded here will be available by all processes through the attribute ``self.positions``. If you specify position on a per frame basis in :py:meth:`load` , this function has no effect. If theoretical positions :py:data:`positions_theory` are provided in the initial parameter set :py:data:`DEFAULT`, specifying positions here has NO effect and will be ignored. The purpose of this function is to avoid reloading and parallel reads on files that may require intense parsing to retrieve the information, e.g. long SPEC log files. If parallel reads or log file parsing for each set of frames is not a time critical issue of the subclass, reimplementing this function can be ignored and it is recommended to only reimplement the :py:meth:`load` method. If `load_parallel` is set to `all` or common`, this function is executed by all nodes, otherwise the master node executes this function and broadcasts the results to other nodes. Returns ------- positions : ndarray A (N,2)-array where *N* is the number of positions. Note ---- Be aware that this method sets attribute :py:attr:`num_frames` in the following manner. * If ``num_frames == None`` : ``num_frames = N``. * If ``num_frames < N`` , no effect. * If ``num_frames > N`` : ``num_frames = N``. The parser for the positions file would look like this. :: >>> def extract_pos(base_path): >>> pos = [] >>> files = [] >>> with open(base_path+'positions.txt') as f: >>> for line in f: >>> fname, y, x = line.strip().split() >>> pos.append((eval(y), eval(x))) >>> files.append(fname) >>> return files, pos >>> And the test: :: >>> files, pos = extract_pos(save_path) >>> print(files[:2]) ['ccd/diffraction_0000.npy', 'ccd/diffraction_0001.npy'] >>> print(pos[:2]) [(0.0, 0.0), (0.0, 0.00041562)] :: >>> @defaults_tree.parse_doc('scandata.numpyscan') >>> class NumpyScan(PtyScan): >>> """ >>> A PtyScan subclass to extract data from a numpy array. >>> >>> Defaults: >>> >>> [name] >>> type = str >>> default = numpyscan >>> help = >>> >>> [base_path] >>> type = str >>> default = /tmp/ptypy/sim/ >>> help = Base path to extract data files from. >>> """ >>> >>> def __init__(self, pars=None, **kwargs): >>> p = self.DEFAULT.copy(depth=2) >>> p.update(pars) >>> >>> with open(p.base_path+'geometry.txt') as f: >>> for line in f: >>> key, value = line.strip().split() >>> # we only replace Nones or missing keys >>> if p.get(key) is None: >>> p[key] = eval(value) >>> >>> super(NumpyScan, self).__init__(p, **kwargs) >>> >>> def load_positions(self): >>> # the base path is now stored in >>> base_path = self.info.base_path >>> pos = [] >>> with open(base_path+'positions.txt') as f: >>> for line in f: >>> fname, y, x = line.strip().split() >>> pos.append((eval(y), eval(x))) >>> files.append(fname) >>> return np.asarray(pos) >>> One nice thing about rewriting ``self.load_positions`` is that the maximum number of frames will be set and we do not need to manually adapt :py:meth:`~ptypy.core.data.PtyScan.check` The last step is to overwrite the actual loading of data. Loading happens (MPI-compatible) in :py:meth:`~ptypy.core.data.PtyScan.load` :: >>> print(PtyScan.load.__doc__) **Override in subclass for custom implementation** Loads data according to node specific scanpoint indices that have been determined by :py:class:`LoadManager` or otherwise. Returns ------- raw, positions, weight : dict Dictionaries whose keys are the given scan point `indices` and whose values are the respective frame / position according to the scan point index. `weight` and `positions` may be empty Note ---- This is the *most* important method to change when subclassing :py:class:`PtyScan`. Most often it suffices to override the constructor and this method to create a subclass suited for a specific experiment. Load seems a bit more complex than ``self.load_positions`` for its return values. However, we can opt-out of providing weights (masks) and positions, as we have already adapted ``self.load_positions`` and there were no bad pixels in the (linear) detector The final subclass looks like this. We overwrite two defaults from `PtyScan`: :: >>> @defaults_tree.parse_doc('scandata.numpyscan') >>> class NumpyScan(PtyScan): >>> """ >>> A PtyScan subclass to extract data from a numpy array. >>> >>> Defaults: >>> >>> [name] >>> type = str >>> default = numpyscan >>> help = >>> >>> [base_path] >>> type = str >>> default = /tmp/ptypy/sim/ >>> help = Base path to extract data files from. >>> >>> [auto_center] >>> default = False >>> >>> [dfile] >>> default = /tmp/ptypy/sim/npy.ptyd >>> """ >>> >>> def __init__(self, pars=None, **kwargs): >>> p = self.DEFAULT.copy(depth=2) >>> p.update(pars) >>> >>> with open(p.base_path+'geometry.txt') as f: >>> for line in f: >>> key, value = line.strip().split() >>> # we only replace Nones or missing keys >>> if p.get(key) is None: >>> p[key] = eval(value) >>> >>> super(NumpyScan, self).__init__(p, **kwargs) >>> >>> def load_positions(self): >>> # the base path is now stored in >>> base_path = self.info.base_path >>> pos = [] >>> with open(base_path+'positions.txt') as f: >>> for line in f: >>> fname, y, x = line.strip().split() >>> pos.append((eval(y), eval(x))) >>> files.append(fname) >>> return np.asarray(pos) >>> >>> def load(self, indices): >>> raw = {} >>> bp = self.info.base_path >>> for ii in indices: >>> raw[ii] = np.load(bp+'ccd/diffraction_%04d.npy' % ii) >>> return raw, {}, {} >>> Loading the data ---------------- With the subclass we create a scan only using defaults :: >>> NPS = NumpyScan() >>> NPS.initialize() In order to process the data. We need to call :py:meth:`~ptypy.core.data.PtyScan.auto` with the chunk size as arguments. It returns a data chunk that we can inspect with :py:func:`ptypy.utils.verbose.report`. The information is concatenated, but the length of iterables or dicts is always indicated in parantheses. :: >>> print(u.verbose.report(NPS.auto(80), noheader=True)) * id3V4AQEBE20 : dict(3) * common : ptypy.utils.parameters.Param(8) * version : 0.1 * num_frames : 116 * label : None * shape : [array = [256 256]] * psize : [array = [0.000172 0.000172]] * energy : 7.2 * center : [array = [128. 128.]] * distance : 7.19 * chunk : ptypy.utils.parameters.Param(6) * indices : list(80) * id2M979S98S8 : 0 * id2M979S98T8 : 1 * id2M979S98U8 : 2 * id2M979S98V8 : 3 * id2M979S9908 : 4 * ... : .... * indices_node : list(80) * id2M979S98S8 : 0 * id2M979S98T8 : 1 * id2M979S98U8 : 2 * id2M979S98V8 : 3 * id2M979S9908 : 4 * ... : .... * num : 0 * data : dict(80) * 0 : [256x256 int32 array] * 1 : [256x256 int32 array] * 2 : [256x256 int32 array] * 3 : [256x256 int32 array] * 4 : [256x256 int32 array] * 5 : [256x256 int32 array] * 6 : [256x256 int32 array] * 7 : [256x256 int32 array] * 8 : [256x256 int32 array] * 9 : [256x256 int32 array] * 10 : [256x256 int32 array] * 11 : [256x256 int32 array] * 12 : [256x256 int32 array] * 13 : [256x256 int32 array] * 14 : [256x256 int32 array] * 15 : [256x256 int32 array] * 16 : [256x256 int32 array] * 17 : [256x256 int32 array] * 18 : [256x256 int32 array] * 19 : [256x256 int32 array] * 20 : [256x256 int32 array] * 21 : [256x256 int32 array] * 22 : [256x256 int32 array] * 23 : [256x256 int32 array] * 24 : [256x256 int32 array] * 25 : [256x256 int32 array] * 26 : [256x256 int32 array] * 27 : [256x256 int32 array] * 28 : [256x256 int32 array] * 29 : [256x256 int32 array] * 30 : [256x256 int32 array] * 31 : [256x256 int32 array] * 32 : [256x256 int32 array] * 33 : [256x256 int32 array] * 34 : [256x256 int32 array] * 35 : [256x256 int32 array] * 36 : [256x256 int32 array] * 37 : [256x256 int32 array] * 38 : [256x256 int32 array] * 39 : [256x256 int32 array] * 40 : [256x256 int32 array] * 41 : [256x256 int32 array] * 42 : [256x256 int32 array] * 43 : [256x256 int32 array] * 44 : [256x256 int32 array] * 45 : [256x256 int32 array] * 46 : [256x256 int32 array] * 47 : [256x256 int32 array] * 48 : [256x256 int32 array] * 49 : [256x256 int32 array] * 50 : [256x256 int32 array] * 51 : [256x256 int32 array] * 52 : [256x256 int32 array] * 53 : [256x256 int32 array] * 54 : [256x256 int32 array] * 55 : [256x256 int32 array] * 56 : [256x256 int32 array] * 57 : [256x256 int32 array] * 58 : [256x256 int32 array] * 59 : [256x256 int32 array] * 60 : [256x256 int32 array] * 61 : [256x256 int32 array] * 62 : [256x256 int32 array] * 63 : [256x256 int32 array] * 64 : [256x256 int32 array] * 65 : [256x256 int32 array] * 66 : [256x256 int32 array] * 67 : [256x256 int32 array] * 68 : [256x256 int32 array] * 69 : [256x256 int32 array] * 70 : [256x256 int32 array] * 71 : [256x256 int32 array] * 72 : [256x256 int32 array] * 73 : [256x256 int32 array] * 74 : [256x256 int32 array] * 75 : [256x256 int32 array] * 76 : [256x256 int32 array] * 77 : [256x256 int32 array] * 78 : [256x256 int32 array] * 79 : [256x256 int32 array] * weights : dict(80) * 0 : [256x256 bool array] * 1 : [256x256 bool array] * 2 : [256x256 bool array] * 3 : [256x256 bool array] * 4 : [256x256 bool array] * 5 : [256x256 bool array] * 6 : [256x256 bool array] * 7 : [256x256 bool array] * 8 : [256x256 bool array] * 9 : [256x256 bool array] * 10 : [256x256 bool array] * 11 : [256x256 bool array] * 12 : [256x256 bool array] * 13 : [256x256 bool array] * 14 : [256x256 bool array] * 15 : [256x256 bool array] * 16 : [256x256 bool array] * 17 : [256x256 bool array] * 18 : [256x256 bool array] * 19 : [256x256 bool array] * 20 : [256x256 bool array] * 21 : [256x256 bool array] * 22 : [256x256 bool array] * 23 : [256x256 bool array] * 24 : [256x256 bool array] * 25 : [256x256 bool array] * 26 : [256x256 bool array] * 27 : [256x256 bool array] * 28 : [256x256 bool array] * 29 : [256x256 bool array] * 30 : [256x256 bool array] * 31 : [256x256 bool array] * 32 : [256x256 bool array] * 33 : [256x256 bool array] * 34 : [256x256 bool array] * 35 : [256x256 bool array] * 36 : [256x256 bool array] * 37 : [256x256 bool array] * 38 : [256x256 bool array] * 39 : [256x256 bool array] * 40 : [256x256 bool array] * 41 : [256x256 bool array] * 42 : [256x256 bool array] * 43 : [256x256 bool array] * 44 : [256x256 bool array] * 45 : [256x256 bool array] * 46 : [256x256 bool array] * 47 : [256x256 bool array] * 48 : [256x256 bool array] * 49 : [256x256 bool array] * 50 : [256x256 bool array] * 51 : [256x256 bool array] * 52 : [256x256 bool array] * 53 : [256x256 bool array] * 54 : [256x256 bool array] * 55 : [256x256 bool array] * 56 : [256x256 bool array] * 57 : [256x256 bool array] * 58 : [256x256 bool array] * 59 : [256x256 bool array] * 60 : [256x256 bool array] * 61 : [256x256 bool array] * 62 : [256x256 bool array] * 63 : [256x256 bool array] * 64 : [256x256 bool array] * 65 : [256x256 bool array] * 66 : [256x256 bool array] * 67 : [256x256 bool array] * 68 : [256x256 bool array] * 69 : [256x256 bool array] * 70 : [256x256 bool array] * 71 : [256x256 bool array] * 72 : [256x256 bool array] * 73 : [256x256 bool array] * 74 : [256x256 bool array] * 75 : [256x256 bool array] * 76 : [256x256 bool array] * 77 : [256x256 bool array] * 78 : [256x256 bool array] * 79 : [256x256 bool array] * positions : [80x2 float64 array] * iterable : list(80) * id3V4AQCNFO0 : dict(4) * index : 0 * data : [256x256 int32 array] * position : [array = [0. 0.]] * mask : [256x256 bool array] * id3V4ANG90M0 : dict(4) * index : 1 * data : [256x256 int32 array] * position : [array = [0. 0.00041562]] * mask : [256x256 bool array] * id3V4ANH7UC0 : dict(4) * index : 2 * data : [256x256 int32 array] * position : [array = [0.00039528 0.00012844]] * mask : [256x256 bool array] * id3V4ANI69A0 : dict(4) * index : 3 * data : [256x256 int32 array] * position : [array = [ 0.0002443 -0.00033625]] * mask : [256x256 bool array] * id3V4ANI6B60 : dict(4) * index : 4 * data : [256x256 int32 array] * position : [array = [-0.0002443 -0.00033625]] * mask : [256x256 bool array] * ... : .... >>> print(u.verbose.report(NPS.auto(80), noheader=True)) * id3V4ANI6C80 : dict(3) * common : ptypy.utils.parameters.Param(8) * version : 0.1 * num_frames : 116 * label : None * shape : [array = [256 256]] * psize : [array = [0.000172 0.000172]] * energy : 7.2 * center : [array = [128. 128.]] * distance : 7.19 * chunk : ptypy.utils.parameters.Param(6) * indices : list(36) * id2M979S9BC8 : 80 * id2M979S9BD8 : 81 * id2M979S9BE8 : 82 * id2M979S9BF8 : 83 * id2M979S9BG8 : 84 * ... : .... * indices_node : list(36) * id2M979S9BC8 : 80 * id2M979S9BD8 : 81 * id2M979S9BE8 : 82 * id2M979S9BF8 : 83 * id2M979S9BG8 : 84 * ... : .... * num : 1 * data : dict(36) * 80 : [256x256 int32 array] * 81 : [256x256 int32 array] * 82 : [256x256 int32 array] * 83 : [256x256 int32 array] * 84 : [256x256 int32 array] * 85 : [256x256 int32 array] * 86 : [256x256 int32 array] * 87 : [256x256 int32 array] * 88 : [256x256 int32 array] * 89 : [256x256 int32 array] * 90 : [256x256 int32 array] * 91 : [256x256 int32 array] * 92 : [256x256 int32 array] * 93 : [256x256 int32 array] * 94 : [256x256 int32 array] * 95 : [256x256 int32 array] * 96 : [256x256 int32 array] * 97 : [256x256 int32 array] * 98 : [256x256 int32 array] * 99 : [256x256 int32 array] * 100 : [256x256 int32 array] * 101 : [256x256 int32 array] * 102 : [256x256 int32 array] * 103 : [256x256 int32 array] * 104 : [256x256 int32 array] * 105 : [256x256 int32 array] * 106 : [256x256 int32 array] * 107 : [256x256 int32 array] * 108 : [256x256 int32 array] * 109 : [256x256 int32 array] * 110 : [256x256 int32 array] * 111 : [256x256 int32 array] * 112 : [256x256 int32 array] * 113 : [256x256 int32 array] * 114 : [256x256 int32 array] * 115 : [256x256 int32 array] * weights : dict(36) * 80 : [256x256 bool array] * 81 : [256x256 bool array] * 82 : [256x256 bool array] * 83 : [256x256 bool array] * 84 : [256x256 bool array] * 85 : [256x256 bool array] * 86 : [256x256 bool array] * 87 : [256x256 bool array] * 88 : [256x256 bool array] * 89 : [256x256 bool array] * 90 : [256x256 bool array] * 91 : [256x256 bool array] * 92 : [256x256 bool array] * 93 : [256x256 bool array] * 94 : [256x256 bool array] * 95 : [256x256 bool array] * 96 : [256x256 bool array] * 97 : [256x256 bool array] * 98 : [256x256 bool array] * 99 : [256x256 bool array] * 100 : [256x256 bool array] * 101 : [256x256 bool array] * 102 : [256x256 bool array] * 103 : [256x256 bool array] * 104 : [256x256 bool array] * 105 : [256x256 bool array] * 106 : [256x256 bool array] * 107 : [256x256 bool array] * 108 : [256x256 bool array] * 109 : [256x256 bool array] * 110 : [256x256 bool array] * 111 : [256x256 bool array] * 112 : [256x256 bool array] * 113 : [256x256 bool array] * 114 : [256x256 bool array] * 115 : [256x256 bool array] * positions : [36x2 float64 array] * iterable : list(36) * id3V4ANI6BQ0 : dict(4) * index : 80 * data : [256x256 int32 array] * position : [array = [0.0018532 0.0016686]] * mask : [256x256 bool array] * id3V4ANGAKA0 : dict(4) * index : 81 * data : [256x256 int32 array] * position : [array = [0.0021597 0.0012469]] * mask : [256x256 bool array] * id3V4ANI6D60 : dict(4) * index : 82 * data : [256x256 int32 array] * position : [array = [0.0023717 0.00077061]] * mask : [256x256 bool array] * id3V4AQEBFK0 : dict(4) * index : 83 * data : [256x256 int32 array] * position : [array = [0.0024801 0.00026067]] * mask : [256x256 bool array] * id3V4ANG7O20 : dict(4) * index : 84 * data : [256x256 int32 array] * position : [array = [ 0.0024801 -0.00026067]] * mask : [256x256 bool array] * ... : .... We observe the second chunk was not 80 frames deep but 34 as we only had 114 frames of data. So where is the *.ptyd* data-file? As default, PtyScan does not actually save data. We have to manually activate it in in the input paramaters. :: >>> data = NPS.DEFAULT.copy(depth=2) >>> data.save = 'append' >>> NPS = NumpyScan(pars=data) >>> NPS.initialize() :: >>> for i in range(50): >>> msg = NPS.auto(20) >>> if msg == NPS.EOS: >>> break >>> We can analyse the saved ``npy.ptyd`` with :py:func:`~ptypy.io.h5IO.h5info` :: >>> from ptypy.io import h5info >>> print(h5info(NPS.info.dfile)) File created : Mon Mar 11 09:53:29 2024 * chunks [dict 6]: * 0 [dict 4]: * data [20x256x256 int32 array] * indices [list = [0.000000, 1.000000, 2.000000, 3.000000, ...]] * positions [20x2 float64 array] * weights [20x256x256 bool array] * 1 [dict 4]: * data [20x256x256 int32 array] * indices [list = [20.000000, 21.000000, 22.000000, 23.000000, ...]] * positions [20x2 float64 array] * weights [20x256x256 bool array] * 2 [dict 4]: * data [20x256x256 int32 array] * indices [list = [40.000000, 41.000000, 42.000000, 43.000000, ...]] * positions [20x2 float64 array] * weights [20x256x256 bool array] * 3 [dict 4]: * data [20x256x256 int32 array] * indices [list = [60.000000, 61.000000, 62.000000, 63.000000, ...]] * positions [20x2 float64 array] * weights [20x256x256 bool array] * 4 [dict 4]: * data [20x256x256 int32 array] * indices [list = [80.000000, 81.000000, 82.000000, 83.000000, ...]] * positions [20x2 float64 array] * weights [20x256x256 bool array] * 5 [dict 4]: * data [16x256x256 int32 array] * indices [list = [100.000000, 101.000000, 102.000000, 103.000000, ...]] * positions [16x2 float64 array] * weights [16x256x256 bool array] * info [dict 23]: * add_poisson_noise [scalar = False] * auto_center [scalar = False] * base_path [string = "b'/tmp/ptypy/sim/'"] * center [array = [128. 128.]] * chunk_format [string = "b'.chunk%02d'"] * dfile [string = "b'/tmp/ptypy/sim/npy.ptyd'"] * distance [scalar = 7.19] * energy [scalar = 7.2] * experimentID [None] * label [None] * load_parallel [string = "b'data'"] * min_frames [scalar = 1] * name [string = "b'numpyscan'"] * num_frames [None] * orientation [None] * positions_scan [116x2 float64 array] * positions_theory [None] * psize [scalar = 0.000172] * rebin [scalar = 1] * save [string = "b'append'"] * shape [array = [256 256]] * version [scalar = 0.1] * weight2d [scalar = True] * meta [dict 8]: * center [array = [128. 128.]] * distance [scalar = 7.19] * energy [scalar = 7.2] * label [None] * num_frames [scalar = 116] * psize [array = [0.000172 0.000172]] * shape [array = [256 256]] * version [scalar = 0.1] None Listing the new subclass ------------------------ In order to make the subclass available in your local |ptypy|, navigate to ``[ptypy_root]/ptypy/experiment`` and paste the content into a new file ``user.py``:: $ touch [ptypy_root]/ptypy/experiment/user.py Append the following lines into ``[ptypy_root]/ptypy/experiment.__init__.py``:: from user import NumpyScan PtyScanTypes.update({'numpy':NumpyScan}) Now, your new subclass will be used whenever you pass ``'numpy'`` for the :py:data:`.scan.data.source` parameter. All special parameters of the class should be passed via the dict :py:data:`.scan.data.recipe`. .. [h5py] http://www.h5py.org/ .. [HDF] **H**\ ierarchical **D**\ ata **F**\ ormat, ``_