Data management¶
Note
In this chapter, We refer to the raw input data with data and not
to data stored in memory of the computer by Storage
instances.
With the term preparation we refer to all data processing
steps prior to the reconstruction and avoid the ambiguous term
processing although it may be more familiar to the reader.
Consider the following generic steps which every ptychographer has to complete prior to a successful image reconstruction.
- (A) Conducting a scanning diffraction experiment.
While or after the experiment is performed, the researcher is left with raw images acquired from the detector and meta data which, in general, consists of scanning positions along with geometric information about the setup, e.g. photon energy, propagation distance, detector pixel size etc.
- (B) Preparing the data.
In this step, the user performs a subset of the following actions
select the appropriate region of the detector where the scattering events were counted,
apply possible pixel corrections to convert the detector counts of the chosen diffraction frame into photon counts, e.g. flat-field and dark-field correction,
switch image orientation to match with the coordinate system of the reconstruction algorithms,
assign a suited mask to exclude invalid pixel data (hot or dead pixel, overexposure),
and/or simply rebin the data.
Finally the user needs to zip the diffraction frames together with the scanning positions.
- (C) Saving the processed data or feed the data into recontruction process.
In this step the user needs to save the data in a suitable format or provide the data directly for the reconstruction engine.
Data management in PtyPy deals with (B) and (C) as a ptychography
reconstruction software naturally cannot provide actual experimental
data. Nevertheless, the treatment of raw data is usually very similar for
every experiment. Consequently, PtyPy provides an abstract base class,
called PtyScan
, which aims to help with steps (B) and (C). In order
to adapt PtyPy for a specific experimental setup, we simply
subclass PtyScan
and reimplement only that subset of its methods which are
affected by the specifics of the experiemental setup
(see Tutorial : Subclassing PtyScan).
The PtyScan class¶
PtyScan
is the abstract base class in PtyPy that manages raw input
data.
A PtyScan instance is constructed from a set of generic parameters,
see scan.data
in the ptypy parameter tree.
It provides the following features:
- Parallelization
When PtyPy is run across several MPI processes, PtyScan takes care of distributing the scan-point indices among processes such that each process only loads the data it will later use in the reconstruction. Hence, the load on the network is not affected by the number of processes. The parallel behavior of
PtyScan
, is controlled by the parameterscan.data.load_parallel
. It uses theLoadManager
- Preparation
PtyScan can handle a few of the raw processing steps mentioned above.
Selection a region-of-interest from the raw detector image. This selection is controlled by the parameters
scan.data.auto_center
, andscan.data.shape
andscan.data.center
.Switching of orientation and rebinning are controlled by
scan.data.orientation
andscan.data.rebin
.Finding a suitable mask or weight for pixel correction is left to the user, as this is a setup-specific implementation. See
load_weight()
,load_common()
,load()
andcorrect()
for detailed explanations.
- Packaging
PtyScan packs the prepared data together with the used scan point indices, scan positions and a weight (=mask) and geometric meta information. This package is requested by the managing instance
ModelManager
on the callnew_data()
.Because data acquisition and preparation can happen during a reconstruction process, it is possible to specify the minimum number of data frames passed to each process on a new_data() by setting the value of
scan.data.min_frames
. The total number of frames processed for a scan is set byscan.data.num_frames
.If not extracted from other files, the user may set the photon energy with
scan.data.energy
, the propagation distance from sample to detector withscan.data.distance
and the detector pixel size withscan.data.psize
.- Storage
PtyScan and its subclass are capable of storing the data in an hfd5-compatible [HDF] file format. The data file names have a custom suffix:
.ptyd
.A detailed overview of the .ptyd data file tree is given below in the section Ptyd file format
The parameters
scan.data.save
andscan.data.chunk_format
control the way PtyScan saves the processed data.Note
Although h5py [h5py] supports parallel write, this feature is not used in ptypy. At the moment, all mpi nodes send their prepared data to the master node which writes the date to a file.
Usage scenarios¶
The PtyScan class of PtyPy provides support for three use cases.
Beamline integreted use.
In this use case, the researcher has integrated PtyPy into the beamline end-station or experimental setup with the help of a custom subclass of
PtyScan
that we callUserScan
. This subclass has its own methods to extract many of the of the generic parameters ofscan.data
and also defaults for specific custom parameters, for instance file paths or file name patterns (for a detailed introduction on how to subclass PtyScan, see Tutorial : Subclassing PtyScan). Once the experiment is completed, the researcher can initiate a reconstruction directly from raw data with a standard reconstruction script.
Post preparation use.
In this use case, the experiment is long passed and the researcher has either used custom subclass of PtyScan or any other script that generates a compatible .hdf5 file (see here) to save prepared data of that experiment. Reconstruction is supposed to work when passing the data file path in the parameter tree.
Only the input file path needs to be passed either with
source
or withdfile
whensource
takes the value'file'
. In that latter case, secondary processing and saving to another file is not supported, while it is allowed in the first case. While the latter case seems infavorable due to the lack of secondary preparation options, it is meant as a user-friendly transition switch from the first reconstruction at the experiment to post-experiment analysis. Only thesource
parameter needs to be altered in script from<..>.data.source=<recipe>
to<..>.data.source='file'
while the rest of the parameters are ignored and may remain untouched.
Preparation and reconstruction on-the-fly with data acquisition.
This use case is for even tighter beamline integration and on-the-fly scans. The researcher has mastered a suitable subclass
UserScan
to prepare data from the setup. Now, the preparation happens in a separate process while image frames are acquired. This process runs a python script where the subclassUserScan
prepares the data using theauto()
method. Thesave
parameter is set to ‘link’ in order to create a separate file for each data chunk and to avoid write access on the source file. The chunk files are linked back into the main source.ptyd
file.All reconstruction processes may access the prepared data without overhead or notable pauses in the reconstruction. For PtyPy there is no difference if compared to a single source file (a feature of [HDF]).
Ptyd file format¶
Ptypy uses the python module h5py [h5py] to store and load data in the Hierarchical Data Format [HDF] . HDF resembles very much a directory/file tree of today’s operating systems, while the “files” are (multidimensonial) datasets.
Ptypy stores and loads the (processed) experimental data in a file with extension
.ptyd, which is a hdf5-file with a data tree of very simple nature.
Comparable to tagged image file formats like .edf or .tiff, the ptyd
data file seperates
meta information (stored in meta/
) from the actual data payload
(stored in chunks/
). A schematic overview of the data tree is depicted below.
*.ptyd/
meta/
[general parameters; optional but very useful]
version : str
num_frames : int
label : str
[geometric porameters; all optional]
shape : int or (int,int)
energy : float, optional
distance : float, optional
center : (float,float) or None, optional
psize : float or (float,float), optional
propagation : "farfield" or "nearfield", optional
...
chunks/
0/
data : array(M,N,N) of float
indices : array(M) of int, optional
positions : array(M ,2) of float
weights : same shape as data or empty
1/
...
2/
...
...
All parameters of meta/
are a subset of scan.data
.
Omitting any of these parameters or setting the value of the dataset to
'None'
has the same effect.
The first set of parameters
version : str
num_frames : int
label : str
are general (optional) parameters.
version
is ptypy version this dataset was prepared with (current version is 0.8.1.dev6f61b9d9, seeversion
).
label
is a custom user label. Choose a unique label to your liking.
num_frames
indicates how many diffraction image frames are expected in the dataset (seenum_frames
) It is important to set this parameter when the data acquisition is not finished but the reconstruction has already started. If the dataset is complete, the loading classPtydScan
retrieves the total number of frames from the payloadchunks/
The next set of optional parameters are
shape : int or (int,int)
energy : float
distance : float
center : (float,float)
psize : float or (float,float)
propagation : "farfield" or "nearfield"
which refer to the experimental scanning geometry.
shape
(seescan.data.shape
)
energy
(seescan.data.energy
orscan.geometry.energy
)
distance
(seescan.data.distance
)
center
: (float,float) (seescan.data.center
)
psize
: float or (float,float) (seescan.data.psize
)
propagation
: “farfield” or “nearfield” (seescan.data.propagation
)
Finally these parameters will be digested by the
geometry
module in order to provide a suited propagator.
Note
As you may have already noted, there are three ways to specify the geometry of the experiment.
bla
As walking the data tree and extracting the data from the hdf5 file
is a bit cumbersome with h5py, there are a few convenience function in the
ptypy.io.h5rw
module.
Tutorial : Subclassing PtyScan¶
Note
This tutorial was generated from the python source
[ptypy_root]/tutorial/subclassptyscan.py
using ptypy/doc/script2rst.py
.
You are encouraged to modify the parameters and rerun the tutorial with:
$ python [ptypy_root]/tutorial/subclassptyscan.py
In this tutorial, we learn how to subclass PtyScan
to make
ptypy work with any experimental setup.
This tutorial can be used as a direct follow-up to Tutorial: Modeling the experiment - Pod, Geometry if section Storing the simulation was completed
Again, the imports first.
>>> import numpy as np
>>> from ptypy.core.data import PtyScan
>>> from ptypy import utils as u
For this tutorial we assume that the data and meta information is in this path:
>>> save_path = '/tmp/ptypy/sim/'
Furthermore, we assume that a file about the experimental geometry is located at
>>> geofilepath = save_path + 'geometry.txt'
>>> print(geofilepath)
/tmp/ptypy/sim/geometry.txt
and has contents of the following form
>>> print(''.join([line for line in open(geofilepath, 'r')]))
distance 1.5000e-01
energy 2.3305e-03
psize 2.4000e-05
shape 256
The scanning positions are in
>>> positionpath = save_path + 'positions.txt'
>>> print(positionpath)
/tmp/ptypy/sim/positions.txt
with a list of positions for vertical and horizontanl movement and the image frame from the “camera”
>>> print(''.join([line for line in open(positionpath, 'r')][:6])+'....')
ccd/diffraction_0000.npy 0.0000e+00 0.0000e+00
ccd/diffraction_0001.npy 0.0000e+00 4.1562e-04
ccd/diffraction_0002.npy 3.9528e-04 1.2844e-04
ccd/diffraction_0003.npy 2.4430e-04 -3.3625e-04
ccd/diffraction_0004.npy -2.4430e-04 -3.3625e-04
ccd/diffraction_0005.npy -3.9528e-04 1.2844e-04
....
Writing a subclass¶
The simplest subclass of PtyScan would look like this
>>> class NumpyScan(PtyScan):
>>> """
>>> A PtyScan subclass to extract data from a numpy array.
>>> """
>>>
>>> def __init__(self, pars=None, **kwargs):
>>> # In init we need to call the parent.
>>> super(NumpyScan, self).__init__(pars, **kwargs)
>>>
Of course this class does nothing special beyond PtyScan. As it is, the class also cannot be used as a real PtyScan instance because its defaults are not properly managed. For this, Ptypy provides a powerful self-documenting tool call a “descriptor” which can be applied to any new class using a decorator. The tree of all valid ptypy parameters is located at here. To manage the default parameters of our subclass and document its existence, we would need to write
>>> from ptypy import defaults_tree
>>> @defaults_tree.parse_doc('scandata.numpyscan')
>>> class NumpyScan(PtyScan):
>>> """
>>> A PtyScan subclass to extract data from a numpy array.
>>> """
>>>
>>> def __init__(self, pars=None, **kwargs):
>>> # In init we need to call the parent.
>>> super(NumpyScan, self).__init__(pars, **kwargs)
>>>
The decorator extracts information from the docstring of the subclass and parent classes about the expected input parameters. Currently the docstring of NumpyScan does not contain anything special, thus the only parameters registered are those of the parent class, PtyScan:
>>> print(defaults_tree['scandata.numpyscan'].to_string())
[name]
default = PtyScan
help =
type = str
[dfile]
default = None
help = File path where prepared data will be saved in the ``ptyd`` format.
type = file
userlevel = 0
[chunk_format]
default = .chunk%02d
help = Appendix to saved files if save == 'link'
type = str
doc =
userlevel = 2
[save]
default = None
help = Saving mode
type = str
doc = Mode to use to save data to file.
<newline>
- ``None``: No saving
- ``'merge'``: attemts to merge data in single chunk **[not implemented]**
- ``'append'``: appends each chunk in master \*.ptyd file
- ``'link'``: appends external links in master \*.ptyd file and stores chunks separately
<newline>
in the path given by the link. Links file paths are relative to master file.
userlevel = 1
[auto_center]
default = None
help = Determine if center in data is calculated automatically
type = bool
doc =
- ``False``, no automatic centering
- ``None``, only if :py:data:`center` is ``None``
- ``True``, it will be enforced
userlevel = 0
[load_parallel]
default = data
help = Determines what will be loaded in parallel
type = str
doc = Choose from ``None``, ``'data'``, ``'common'``, ``'all'``
choices = ['data', 'common', 'all']
[rebin]
default = None
help = Rebinning factor
type = int
doc = Rebinning factor for the raw data frames. ``'None'`` or ``1`` both mean *no binning*
userlevel = 1
lowlim = 1
uplim = 32
[orientation]
default = None
help = Data frame orientation
type = int, tuple, list
doc = Choose
<newline>
- ``None`` or ``0``: correct orientation
- ``1``: invert columns (numpy.flip_lr)
- ``2``: invert rows (numpy.flip_ud)
- ``3``: invert columns, invert rows
- ``4``: transpose (numpy.transpose)
- ``4+i``: tranpose + other operations from above
<newline>
Alternatively, a 3-tuple of booleans may be provided ``(do_transpose,
do_flipud, do_fliplr)``
choices = [0, 1, 2, 3, 4, 5, 6, 7]
userlevel = 1
[min_frames]
default = 1
help = Minimum number of frames loaded by each node
type = int
doc =
userlevel = 2
lowlim = 1
[positions_theory]
default = None
help = Theoretical positions for this scan
type = ndarray
doc = If provided, experimental positions from :py:class:`PtyScan` subclass will be ignored. If data
preparation is called from Ptycho instance, the calculated positions from the
:py:func:`ptypy.core.xy.from_pars` dict will be inserted here
userlevel = 2
[num_frames]
default = None
help = Maximum number of frames to be prepared
type = int
doc = If `positions_theory` are provided, num_frames will be ovverriden with the number of
positions available
userlevel = 1
[label]
default = None
help = The scan label
type = str
doc = Unique string identifying the scan
userlevel = 1
[experimentID]
default = None
help = Name of the experiment
type = str
doc = If None, a default value will be provided by the recipe. **unused**
userlevel = 2
[version]
default = 0.1
help = TODO: Explain this and decide if it is a user parameter.
type = float
doc =
userlevel = 2
[shape]
default = 256
help = Shape of the region of interest cropped from the raw data.
type = int, tuple
doc = Cropping dimension of the diffraction frame
Can be None, (dimx, dimy), or dim. In the latter case shape will be (dim, dim).
userlevel = 1
[center]
default = 'fftshift'
help = Center (pixel) of the optical axes in raw data
type = list, tuple, str
doc = If ``None``, this parameter will be set by :py:data:`~.scan.data.auto_center` or elsewhere
userlevel = 1
[psize]
default = 0.000172
help = Detector pixel size
type = float, tuple
doc = Dimensions of the detector pixels (in meters)
userlevel = 0
lowlim = 0
[distance]
default = 7.19
help = Sample to detector distance
type = float
doc = In meters.
userlevel = 0
lowlim = 0
[energy]
default = 7.2
help = Photon energy of the incident radiation in keV
type = float
doc =
userlevel = 0
lowlim = 0
[add_poisson_noise]
default = False
help = Decides whether the scan should have poisson noise or not
type = bool
As you can see, there are already many parameters documented in PtyScan’s class. For each parameter, most important are the type, default value and help string. The decorator does more than collect this information: it also generates from it a class variable called DEFAULT, which stores all defaults:
>>> print(u.verbose.report(NumpyScan.DEFAULT, noheader=True))
* id3V4ANI238G : ptypy.utils.parameters.Param(20)
* name : PtyScan
* dfile : None
* chunk_format : .chunk%02d
* save : None
* auto_center : None
* load_parallel : data
* rebin : None
* orientation : None
* min_frames : 1
* positions_theory : None
* num_frames : None
* label : None
* experimentID : None
* version : 0.1
* shape : 256
* center : fftshift
* psize : 0.000172
* distance : 7.19
* energy : 7.2
* add_poisson_noise : False
Now we are ready to add functionality to our subclass.
A first step of initialisation would be to retrieve
the geometric information that we stored in geofilepath
and update
the input parameters with it.
We write a tiny file parser.
>>> def extract_geo(base_path):
>>> out = {}
>>> with open(base_path+'geometry.txt') as f:
>>> for line in f:
>>> key, value = line.strip().split()
>>> out[key] = eval(value)
>>> return out
>>>
We test it.
>>> print(extract_geo(save_path))
{'distance': 0.15, 'energy': 0.0023305, 'psize': 2.4e-05, 'shape': 256}
That seems to work. We can integrate this parser into the initialisation as we assume that this small access can be done by all MPI nodes without data access problems. Hence, our subclass becomes
>>> @defaults_tree.parse_doc('scandata.numpyscan')
>>> class NumpyScan(PtyScan):
>>> """
>>> A PtyScan subclass to extract data from a numpy array.
>>>
>>> Defaults:
>>>
>>> [name]
>>> type = str
>>> default = numpyscan
>>> help =
>>>
>>> [base_path]
>>> type = str
>>> default = './'
>>> help = Base path to extract data files from.
>>> """
>>>
>>> def __init__(self, pars=None, **kwargs):
>>> p = self.DEFAULT.copy(depth=2)
>>> p.update(pars)
>>>
>>> with open(p.base_path+'geometry.txt') as f:
>>> for line in f:
>>> key, value = line.strip().split()
>>> # we only replace Nones or missing keys
>>> if p.get(key) is None:
>>> p[key] = eval(value)
>>>
>>> super(NumpyScan, self).__init__(p, **kwargs)
>>>
We now need a new input parameter called base_path, so we documented it in the docstring after the section header “Defaults:”.
>>> print(defaults_tree['scandata.numpyscan.base_path'])
[base_path]
default = './'
help = Base path to extract data files from.
type = str
As you can see, the first step in __init__ is to build a default parameter structure to ensure that all input parameters are available. The next line updates this structure to overwrite the entries specified by the user.
Good! Next, we need to implement how the class finds out about
the positions in the scan. The method
load_positions()
can be used
for this purpose.
>>> print(PtyScan.load_positions.__doc__)
**Override in subclass for custom implementation**
*Called in* :py:meth:`initialize`
Loads all positions for all diffraction patterns in this scan.
The positions loaded here will be available by all processes
through the attribute ``self.positions``. If you specify position
on a per frame basis in :py:meth:`load` , this function has no
effect.
If theoretical positions :py:data:`positions_theory` are
provided in the initial parameter set :py:data:`DEFAULT`,
specifying positions here has NO effect and will be ignored.
The purpose of this function is to avoid reloading and parallel
reads on files that may require intense parsing to retrieve the
information, e.g. long SPEC log files. If parallel reads or
log file parsing for each set of frames is not a time critical
issue of the subclass, reimplementing this function can be ignored
and it is recommended to only reimplement the :py:meth:`load`
method.
If `load_parallel` is set to `all` or common`, this function is
executed by all nodes, otherwise the master node executes this
function and broadcasts the results to other nodes.
Returns
-------
positions : ndarray
A (N,2)-array where *N* is the number of positions.
Note
----
Be aware that this method sets attribute :py:attr:`num_frames`
in the following manner.
* If ``num_frames == None`` : ``num_frames = N``.
* If ``num_frames < N`` , no effect.
* If ``num_frames > N`` : ``num_frames = N``.
The parser for the positions file would look like this.
>>> def extract_pos(base_path):
>>> pos = []
>>> files = []
>>> with open(base_path+'positions.txt') as f:
>>> for line in f:
>>> fname, y, x = line.strip().split()
>>> pos.append((eval(y), eval(x)))
>>> files.append(fname)
>>> return files, pos
>>>
And the test:
>>> files, pos = extract_pos(save_path)
>>> print(files[:2])
['ccd/diffraction_0000.npy', 'ccd/diffraction_0001.npy']
>>> print(pos[:2])
[(0.0, 0.0), (0.0, 0.00041562)]
>>> @defaults_tree.parse_doc('scandata.numpyscan')
>>> class NumpyScan(PtyScan):
>>> """
>>> A PtyScan subclass to extract data from a numpy array.
>>>
>>> Defaults:
>>>
>>> [name]
>>> type = str
>>> default = numpyscan
>>> help =
>>>
>>> [base_path]
>>> type = str
>>> default = /tmp/ptypy/sim/
>>> help = Base path to extract data files from.
>>> """
>>>
>>> def __init__(self, pars=None, **kwargs):
>>> p = self.DEFAULT.copy(depth=2)
>>> p.update(pars)
>>>
>>> with open(p.base_path+'geometry.txt') as f:
>>> for line in f:
>>> key, value = line.strip().split()
>>> # we only replace Nones or missing keys
>>> if p.get(key) is None:
>>> p[key] = eval(value)
>>>
>>> super(NumpyScan, self).__init__(p, **kwargs)
>>>
>>> def load_positions(self):
>>> # the base path is now stored in
>>> base_path = self.info.base_path
>>> pos = []
>>> with open(base_path+'positions.txt') as f:
>>> for line in f:
>>> fname, y, x = line.strip().split()
>>> pos.append((eval(y), eval(x)))
>>> files.append(fname)
>>> return np.asarray(pos)
>>>
One nice thing about rewriting self.load_positions
is that
the maximum number of frames will be set and we do not need to
manually adapt check()
The last step is to overwrite the actual loading of data.
Loading happens (MPI-compatible) in
load()
>>> print(PtyScan.load.__doc__)
**Override in subclass for custom implementation**
Loads data according to node specific scanpoint indices that have
been determined by :py:class:`LoadManager` or otherwise.
Returns
-------
raw, positions, weight : dict
Dictionaries whose keys are the given scan point `indices`
and whose values are the respective frame / position according
to the scan point index. `weight` and `positions` may be empty
Note
----
This is the *most* important method to change when subclassing
:py:class:`PtyScan`. Most often it suffices to override the constructor
and this method to create a subclass suited for a specific
experiment.
Load seems a bit more complex than self.load_positions
for its
return values. However, we can opt-out of providing weights (masks)
and positions, as we have already adapted self.load_positions
and there were no bad pixels in the (linear) detector
The final subclass looks like this. We overwrite two defaults from PtyScan:
>>> @defaults_tree.parse_doc('scandata.numpyscan')
>>> class NumpyScan(PtyScan):
>>> """
>>> A PtyScan subclass to extract data from a numpy array.
>>>
>>> Defaults:
>>>
>>> [name]
>>> type = str
>>> default = numpyscan
>>> help =
>>>
>>> [base_path]
>>> type = str
>>> default = /tmp/ptypy/sim/
>>> help = Base path to extract data files from.
>>>
>>> [auto_center]
>>> default = False
>>>
>>> [dfile]
>>> default = /tmp/ptypy/sim/npy.ptyd
>>> """
>>>
>>> def __init__(self, pars=None, **kwargs):
>>> p = self.DEFAULT.copy(depth=2)
>>> p.update(pars)
>>>
>>> with open(p.base_path+'geometry.txt') as f:
>>> for line in f:
>>> key, value = line.strip().split()
>>> # we only replace Nones or missing keys
>>> if p.get(key) is None:
>>> p[key] = eval(value)
>>>
>>> super(NumpyScan, self).__init__(p, **kwargs)
>>>
>>> def load_positions(self):
>>> # the base path is now stored in
>>> base_path = self.info.base_path
>>> pos = []
>>> with open(base_path+'positions.txt') as f:
>>> for line in f:
>>> fname, y, x = line.strip().split()
>>> pos.append((eval(y), eval(x)))
>>> files.append(fname)
>>> return np.asarray(pos)
>>>
>>> def load(self, indices):
>>> raw = {}
>>> bp = self.info.base_path
>>> for ii in indices:
>>> raw[ii] = np.load(bp+'ccd/diffraction_%04d.npy' % ii)
>>> return raw, {}, {}
>>>
Loading the data¶
With the subclass we create a scan only using defaults
>>> NPS = NumpyScan()
>>> NPS.initialize()
In order to process the data. We need to call
auto()
with the chunk size
as arguments. It returns a data chunk that we can inspect
with ptypy.utils.verbose.report()
. The information is
concatenated, but the length of iterables or dicts is always indicated
in parantheses.
>>> print(u.verbose.report(NPS.auto(80), noheader=True))
* id3V4AQEBE20 : dict(3)
* common : ptypy.utils.parameters.Param(8)
* version : 0.1
* num_frames : 116
* label : None
* shape : [array = [256 256]]
* psize : [array = [0.000172 0.000172]]
* energy : 7.2
* center : [array = [128. 128.]]
* distance : 7.19
* chunk : ptypy.utils.parameters.Param(6)
* indices : list(80)
* id2M979S98S8 : 0
* id2M979S98T8 : 1
* id2M979S98U8 : 2
* id2M979S98V8 : 3
* id2M979S9908 : 4
* ... : ....
* indices_node : list(80)
* id2M979S98S8 : 0
* id2M979S98T8 : 1
* id2M979S98U8 : 2
* id2M979S98V8 : 3
* id2M979S9908 : 4
* ... : ....
* num : 0
* data : dict(80)
* 0 : [256x256 int32 array]
* 1 : [256x256 int32 array]
* 2 : [256x256 int32 array]
* 3 : [256x256 int32 array]
* 4 : [256x256 int32 array]
* 5 : [256x256 int32 array]
* 6 : [256x256 int32 array]
* 7 : [256x256 int32 array]
* 8 : [256x256 int32 array]
* 9 : [256x256 int32 array]
* 10 : [256x256 int32 array]
* 11 : [256x256 int32 array]
* 12 : [256x256 int32 array]
* 13 : [256x256 int32 array]
* 14 : [256x256 int32 array]
* 15 : [256x256 int32 array]
* 16 : [256x256 int32 array]
* 17 : [256x256 int32 array]
* 18 : [256x256 int32 array]
* 19 : [256x256 int32 array]
* 20 : [256x256 int32 array]
* 21 : [256x256 int32 array]
* 22 : [256x256 int32 array]
* 23 : [256x256 int32 array]
* 24 : [256x256 int32 array]
* 25 : [256x256 int32 array]
* 26 : [256x256 int32 array]
* 27 : [256x256 int32 array]
* 28 : [256x256 int32 array]
* 29 : [256x256 int32 array]
* 30 : [256x256 int32 array]
* 31 : [256x256 int32 array]
* 32 : [256x256 int32 array]
* 33 : [256x256 int32 array]
* 34 : [256x256 int32 array]
* 35 : [256x256 int32 array]
* 36 : [256x256 int32 array]
* 37 : [256x256 int32 array]
* 38 : [256x256 int32 array]
* 39 : [256x256 int32 array]
* 40 : [256x256 int32 array]
* 41 : [256x256 int32 array]
* 42 : [256x256 int32 array]
* 43 : [256x256 int32 array]
* 44 : [256x256 int32 array]
* 45 : [256x256 int32 array]
* 46 : [256x256 int32 array]
* 47 : [256x256 int32 array]
* 48 : [256x256 int32 array]
* 49 : [256x256 int32 array]
* 50 : [256x256 int32 array]
* 51 : [256x256 int32 array]
* 52 : [256x256 int32 array]
* 53 : [256x256 int32 array]
* 54 : [256x256 int32 array]
* 55 : [256x256 int32 array]
* 56 : [256x256 int32 array]
* 57 : [256x256 int32 array]
* 58 : [256x256 int32 array]
* 59 : [256x256 int32 array]
* 60 : [256x256 int32 array]
* 61 : [256x256 int32 array]
* 62 : [256x256 int32 array]
* 63 : [256x256 int32 array]
* 64 : [256x256 int32 array]
* 65 : [256x256 int32 array]
* 66 : [256x256 int32 array]
* 67 : [256x256 int32 array]
* 68 : [256x256 int32 array]
* 69 : [256x256 int32 array]
* 70 : [256x256 int32 array]
* 71 : [256x256 int32 array]
* 72 : [256x256 int32 array]
* 73 : [256x256 int32 array]
* 74 : [256x256 int32 array]
* 75 : [256x256 int32 array]
* 76 : [256x256 int32 array]
* 77 : [256x256 int32 array]
* 78 : [256x256 int32 array]
* 79 : [256x256 int32 array]
* weights : dict(80)
* 0 : [256x256 bool array]
* 1 : [256x256 bool array]
* 2 : [256x256 bool array]
* 3 : [256x256 bool array]
* 4 : [256x256 bool array]
* 5 : [256x256 bool array]
* 6 : [256x256 bool array]
* 7 : [256x256 bool array]
* 8 : [256x256 bool array]
* 9 : [256x256 bool array]
* 10 : [256x256 bool array]
* 11 : [256x256 bool array]
* 12 : [256x256 bool array]
* 13 : [256x256 bool array]
* 14 : [256x256 bool array]
* 15 : [256x256 bool array]
* 16 : [256x256 bool array]
* 17 : [256x256 bool array]
* 18 : [256x256 bool array]
* 19 : [256x256 bool array]
* 20 : [256x256 bool array]
* 21 : [256x256 bool array]
* 22 : [256x256 bool array]
* 23 : [256x256 bool array]
* 24 : [256x256 bool array]
* 25 : [256x256 bool array]
* 26 : [256x256 bool array]
* 27 : [256x256 bool array]
* 28 : [256x256 bool array]
* 29 : [256x256 bool array]
* 30 : [256x256 bool array]
* 31 : [256x256 bool array]
* 32 : [256x256 bool array]
* 33 : [256x256 bool array]
* 34 : [256x256 bool array]
* 35 : [256x256 bool array]
* 36 : [256x256 bool array]
* 37 : [256x256 bool array]
* 38 : [256x256 bool array]
* 39 : [256x256 bool array]
* 40 : [256x256 bool array]
* 41 : [256x256 bool array]
* 42 : [256x256 bool array]
* 43 : [256x256 bool array]
* 44 : [256x256 bool array]
* 45 : [256x256 bool array]
* 46 : [256x256 bool array]
* 47 : [256x256 bool array]
* 48 : [256x256 bool array]
* 49 : [256x256 bool array]
* 50 : [256x256 bool array]
* 51 : [256x256 bool array]
* 52 : [256x256 bool array]
* 53 : [256x256 bool array]
* 54 : [256x256 bool array]
* 55 : [256x256 bool array]
* 56 : [256x256 bool array]
* 57 : [256x256 bool array]
* 58 : [256x256 bool array]
* 59 : [256x256 bool array]
* 60 : [256x256 bool array]
* 61 : [256x256 bool array]
* 62 : [256x256 bool array]
* 63 : [256x256 bool array]
* 64 : [256x256 bool array]
* 65 : [256x256 bool array]
* 66 : [256x256 bool array]
* 67 : [256x256 bool array]
* 68 : [256x256 bool array]
* 69 : [256x256 bool array]
* 70 : [256x256 bool array]
* 71 : [256x256 bool array]
* 72 : [256x256 bool array]
* 73 : [256x256 bool array]
* 74 : [256x256 bool array]
* 75 : [256x256 bool array]
* 76 : [256x256 bool array]
* 77 : [256x256 bool array]
* 78 : [256x256 bool array]
* 79 : [256x256 bool array]
* positions : [80x2 float64 array]
* iterable : list(80)
* id3V4AQCNFO0 : dict(4)
* index : 0
* data : [256x256 int32 array]
* position : [array = [0. 0.]]
* mask : [256x256 bool array]
* id3V4ANG90M0 : dict(4)
* index : 1
* data : [256x256 int32 array]
* position : [array = [0. 0.00041562]]
* mask : [256x256 bool array]
* id3V4ANH7UC0 : dict(4)
* index : 2
* data : [256x256 int32 array]
* position : [array = [0.00039528 0.00012844]]
* mask : [256x256 bool array]
* id3V4ANI69A0 : dict(4)
* index : 3
* data : [256x256 int32 array]
* position : [array = [ 0.0002443 -0.00033625]]
* mask : [256x256 bool array]
* id3V4ANI6B60 : dict(4)
* index : 4
* data : [256x256 int32 array]
* position : [array = [-0.0002443 -0.00033625]]
* mask : [256x256 bool array]
* ... : ....
>>> print(u.verbose.report(NPS.auto(80), noheader=True))
* id3V4ANI6C80 : dict(3)
* common : ptypy.utils.parameters.Param(8)
* version : 0.1
* num_frames : 116
* label : None
* shape : [array = [256 256]]
* psize : [array = [0.000172 0.000172]]
* energy : 7.2
* center : [array = [128. 128.]]
* distance : 7.19
* chunk : ptypy.utils.parameters.Param(6)
* indices : list(36)
* id2M979S9BC8 : 80
* id2M979S9BD8 : 81
* id2M979S9BE8 : 82
* id2M979S9BF8 : 83
* id2M979S9BG8 : 84
* ... : ....
* indices_node : list(36)
* id2M979S9BC8 : 80
* id2M979S9BD8 : 81
* id2M979S9BE8 : 82
* id2M979S9BF8 : 83
* id2M979S9BG8 : 84
* ... : ....
* num : 1
* data : dict(36)
* 80 : [256x256 int32 array]
* 81 : [256x256 int32 array]
* 82 : [256x256 int32 array]
* 83 : [256x256 int32 array]
* 84 : [256x256 int32 array]
* 85 : [256x256 int32 array]
* 86 : [256x256 int32 array]
* 87 : [256x256 int32 array]
* 88 : [256x256 int32 array]
* 89 : [256x256 int32 array]
* 90 : [256x256 int32 array]
* 91 : [256x256 int32 array]
* 92 : [256x256 int32 array]
* 93 : [256x256 int32 array]
* 94 : [256x256 int32 array]
* 95 : [256x256 int32 array]
* 96 : [256x256 int32 array]
* 97 : [256x256 int32 array]
* 98 : [256x256 int32 array]
* 99 : [256x256 int32 array]
* 100 : [256x256 int32 array]
* 101 : [256x256 int32 array]
* 102 : [256x256 int32 array]
* 103 : [256x256 int32 array]
* 104 : [256x256 int32 array]
* 105 : [256x256 int32 array]
* 106 : [256x256 int32 array]
* 107 : [256x256 int32 array]
* 108 : [256x256 int32 array]
* 109 : [256x256 int32 array]
* 110 : [256x256 int32 array]
* 111 : [256x256 int32 array]
* 112 : [256x256 int32 array]
* 113 : [256x256 int32 array]
* 114 : [256x256 int32 array]
* 115 : [256x256 int32 array]
* weights : dict(36)
* 80 : [256x256 bool array]
* 81 : [256x256 bool array]
* 82 : [256x256 bool array]
* 83 : [256x256 bool array]
* 84 : [256x256 bool array]
* 85 : [256x256 bool array]
* 86 : [256x256 bool array]
* 87 : [256x256 bool array]
* 88 : [256x256 bool array]
* 89 : [256x256 bool array]
* 90 : [256x256 bool array]
* 91 : [256x256 bool array]
* 92 : [256x256 bool array]
* 93 : [256x256 bool array]
* 94 : [256x256 bool array]
* 95 : [256x256 bool array]
* 96 : [256x256 bool array]
* 97 : [256x256 bool array]
* 98 : [256x256 bool array]
* 99 : [256x256 bool array]
* 100 : [256x256 bool array]
* 101 : [256x256 bool array]
* 102 : [256x256 bool array]
* 103 : [256x256 bool array]
* 104 : [256x256 bool array]
* 105 : [256x256 bool array]
* 106 : [256x256 bool array]
* 107 : [256x256 bool array]
* 108 : [256x256 bool array]
* 109 : [256x256 bool array]
* 110 : [256x256 bool array]
* 111 : [256x256 bool array]
* 112 : [256x256 bool array]
* 113 : [256x256 bool array]
* 114 : [256x256 bool array]
* 115 : [256x256 bool array]
* positions : [36x2 float64 array]
* iterable : list(36)
* id3V4ANI6BQ0 : dict(4)
* index : 80
* data : [256x256 int32 array]
* position : [array = [0.0018532 0.0016686]]
* mask : [256x256 bool array]
* id3V4ANGAKA0 : dict(4)
* index : 81
* data : [256x256 int32 array]
* position : [array = [0.0021597 0.0012469]]
* mask : [256x256 bool array]
* id3V4ANI6D60 : dict(4)
* index : 82
* data : [256x256 int32 array]
* position : [array = [0.0023717 0.00077061]]
* mask : [256x256 bool array]
* id3V4AQEBFK0 : dict(4)
* index : 83
* data : [256x256 int32 array]
* position : [array = [0.0024801 0.00026067]]
* mask : [256x256 bool array]
* id3V4ANG7O20 : dict(4)
* index : 84
* data : [256x256 int32 array]
* position : [array = [ 0.0024801 -0.00026067]]
* mask : [256x256 bool array]
* ... : ....
We observe the second chunk was not 80 frames deep but 34 as we only had 114 frames of data.
So where is the .ptyd data-file? As default, PtyScan does not actually save data. We have to manually activate it in in the input paramaters.
>>> data = NPS.DEFAULT.copy(depth=2)
>>> data.save = 'append'
>>> NPS = NumpyScan(pars=data)
>>> NPS.initialize()
>>> for i in range(50):
>>> msg = NPS.auto(20)
>>> if msg == NPS.EOS:
>>> break
>>>
We can analyse the saved npy.ptyd
with
h5info()
>>> from ptypy.io import h5info
>>> print(h5info(NPS.info.dfile))
File created : Mon Mar 11 09:53:29 2024
* chunks [dict 6]:
* 0 [dict 4]:
* data [20x256x256 int32 array]
* indices [list = [0.000000, 1.000000, 2.000000, 3.000000, ...]]
* positions [20x2 float64 array]
* weights [20x256x256 bool array]
* 1 [dict 4]:
* data [20x256x256 int32 array]
* indices [list = [20.000000, 21.000000, 22.000000, 23.000000, ...]]
* positions [20x2 float64 array]
* weights [20x256x256 bool array]
* 2 [dict 4]:
* data [20x256x256 int32 array]
* indices [list = [40.000000, 41.000000, 42.000000, 43.000000, ...]]
* positions [20x2 float64 array]
* weights [20x256x256 bool array]
* 3 [dict 4]:
* data [20x256x256 int32 array]
* indices [list = [60.000000, 61.000000, 62.000000, 63.000000, ...]]
* positions [20x2 float64 array]
* weights [20x256x256 bool array]
* 4 [dict 4]:
* data [20x256x256 int32 array]
* indices [list = [80.000000, 81.000000, 82.000000, 83.000000, ...]]
* positions [20x2 float64 array]
* weights [20x256x256 bool array]
* 5 [dict 4]:
* data [16x256x256 int32 array]
* indices [list = [100.000000, 101.000000, 102.000000, 103.000000, ...]]
* positions [16x2 float64 array]
* weights [16x256x256 bool array]
* info [dict 23]:
* add_poisson_noise [scalar = False]
* auto_center [scalar = False]
* base_path [string = "b'/tmp/ptypy/sim/'"]
* center [array = [128. 128.]]
* chunk_format [string = "b'.chunk%02d'"]
* dfile [string = "b'/tmp/ptypy/sim/npy.ptyd'"]
* distance [scalar = 7.19]
* energy [scalar = 7.2]
* experimentID [None]
* label [None]
* load_parallel [string = "b'data'"]
* min_frames [scalar = 1]
* name [string = "b'numpyscan'"]
* num_frames [None]
* orientation [None]
* positions_scan [116x2 float64 array]
* positions_theory [None]
* psize [scalar = 0.000172]
* rebin [scalar = 1]
* save [string = "b'append'"]
* shape [array = [256 256]]
* version [scalar = 0.1]
* weight2d [scalar = True]
* meta [dict 8]:
* center [array = [128. 128.]]
* distance [scalar = 7.19]
* energy [scalar = 7.2]
* label [None]
* num_frames [scalar = 116]
* psize [array = [0.000172 0.000172]]
* shape [array = [256 256]]
* version [scalar = 0.1]
None
Listing the new subclass¶
In order to make the subclass available in your local PtyPy,
navigate to [ptypy_root]/ptypy/experiment
and paste the content
into a new file user.py
:
$ touch [ptypy_root]/ptypy/experiment/user.py
Append the following lines into [ptypy_root]/ptypy/experiment.__init__.py
:
from user import NumpyScan
PtyScanTypes.update({'numpy':NumpyScan})
Now, your new subclass will be used whenever you pass 'numpy'
for
the scan.data.source
parameter. All special parameters of the class
should be passed via the dict scan.data.recipe
.