Applies to code in Steve's GitHub. Written by Aladdin, Visal, Steve.
This repository contains two versions of the code: the folder toeplitz_decomp
contains a double precision, CPU-only version of the code; the folder toeplitz_decomp_gpu
contains a single-precision version of code which can be run on CPUs only, or can utilize one or more GPUs.
To extract binned data, use extract_realData2.py
, which requires Python 2.7, NumPy, SciPy, and Matplotlib.
Extract data on a CITA machine or your personal computer, then move the extracted data to the system you wish to run the deconvolution routine on.
To extract, use the format:
$ python extract_realData2.py binnedDataFile numrows numcols offsetn offsetm n m
where binnedDataFile is the raw data you wish to extract (must be in the same directory as extract_realData2.py
). numrows and numcols refer to the raw data, and must be set correctly for proper extraction. numrows is the number of blocks (or the number of frequency bins) and numcols is the size of each block (or the number of time bins). The remaining arguments apply to the extracted dynamic spectrum, and can be set as desired. offsetn and offsetm are the lower bounds of the frequency and time bins, respectively. n and m are the total number of frequency and time bins desired (or, equivalently, the number of blocks and the size of each block, respectively)
For example, if your data has numrows = 16384, numcols = 660, and you want offsetn = 0, offsetm = 0, n = 4, m = 8, use:
python extract_realData2.py weighted_average_0_1_6.bin 16384 660 0 0 4 8
This will create the directory ./processedData/gate0_numblock_4_meff_16_offsetn_0_offsetm_0
Note that if a directory with this name already exists, the data therein will be overwritten without warning when extract_realData2.py
executes.
The extraction routine assumes that the input data is of type float32, binary format. If your input data is not a float32 binary file, modify the import step (e.g. change 'float32' to 'float64' in the np.memmap call; or if your data is a NumPy array, change the np.memmap call to np.load).
The format of the directory name is: gate0_numblock_(n)_meff_(mx2)_offsetn_(offsetn)_offsetm_(offsetm)
Note that the value of m is doubled in the directory name, but you must use the original value of m when you perform the decomposition.
Inside this folder, there will be a gate0_numblock_(n)_meff_(mx2)_offsetn_(offsetn)_offsetm_(offsetm)_toep.npy
file. There will also be n npy files. They each represent a block of the Toeplitz matrix. Only the files within this folder are required to perform the decomposition.
The decomposition can be performed locally, on SciNet, BGQ, or the SOSCIP GPU cluster. The following instructions are for running the double precision, CPU-only version of the code on the BGQ.
Please refer to SciNet BGQ wiki before continuing.
- Compress the processed data
$ cd processedData/
$ tar -zcvf processedData.tar.gz gate0_numblock_(n)_meff_(mx2)_offsetn_(offsetn)_offsetm_(offsetm)
- Move the compressed folder into BGQ using your login information. For example, using scp:
scp processedData.tar.gz (BGQusername)@bgqdev.scinet.utoronto.ca:~/
- ssh into BGQ
$ ssh (BGQusername)@bgqdev.scinet.utoronto.ca -X
- Clone the GitHub repository containing the source code:
git clone https://github.com/sufkes/scintillometry.git
- Move the compressed data into the folder containing the source code:
mv processedData.tar.gz scintillometry/toeplitz_decomp/processedData/
- Copy the source code and the extracted data to your scratch directory:
cp -r scintillometry/toeplitz_decomp/ $SCRATCH
- Move to your scratch directory and extract the data:
cd $SCRATCH/toeplitz_decomp/processedData/
tar -zxvf processedData.tar.gz
cd ..
- Write/edit a job script per the instructions below.
- Copy the template job script
jobscript_bgq_debugjob.sh
to a new name:
cp jobscript_bgq_debugjob.sh smalljob_name.sh
- Edit the copy
smalljob_name.sh
(e.g. with emacs, vi).
-
method is the decomposition scheme. yty2 is the method used in Nilou's report.
-
Set parameters offsetn, offsetm, n and m to the values that were used in
extract_realData2.py
. -
p is an integer parameter used in the decomposition (function currently unclear). It can be set to 2m, m, m/2, m/4. Fastest results reportedly occur for p = m/2 or p = m/4.
-
pad is a Boolean value which specifies whether or not to use padding (1 or 0).
-
bg_size is the number of nodes in the block. This is automatically set to 64 in a debugjob.
-
NP is the number of MPI processes. It must be set to 2n.
-
RPN is the number of MPI processes per node.
-
OMP_NUM_THREADS is the number of OpenMP threads per MPI process.
The following conditions must hold for the run to execute:
- NP = 2n
- NP ≤ (RPN * bg_size)
- RPN ≤ NP
- (RPN * OMP_NUM_THREADS) ≤ 64 = number of threads per node.
- Request a debug block:
debugjob
- Once inside the debug block, execute the job script:
./smalljob_name.sh
The run is timed using the time
function. Execution consists of 2n - 1 loops. Results are saved in ./results/
- Move results from scratch directory to desired location.
Results are stored in ./results/gate0_numblock_(n)_meff_(mx2)_offsetn_(offsetn)_offsetm_(offsetm)_uc.npy
.
- Copy the template job script
jobscript_bgq_large.sh
to a new name:
cp jobscript_bgq_large.sh largejob_name.sh
- Edit the copy
largejob_name.sh
.
- Follow the same instructions as for small jobs.
- The number of nodes must be specified. bg_size = 64, 128, 256, 512, 1024, 2048.
- Set sourcedir to the directory of the source code.
- Submit the job:
llsubmit ./largejob_name.sh
See SciNet BGQ wiki for instructions on monitoring progress.
- Move results from scratch directory to desired location.
Results are stored in ./results/gate0_numblock_(n)_meff_(mx2)_offsetn_(offsetn)_offsetm_(offsetm)_uc.npy
.
Please refer to SciNet SOSCIP GPU wiki before continuing.
To run the GPU version of the code on the SOSCIP GPU cluster, follow steps 1-8 for running the CPU-only version of the code on the BGQ, instead using the scripts in the toeplitz_decomp_gpu
folder.
- Copy the template job script
jobscript_soscip_gpu.sh
to a new name:
cp jobscript_soscip_gpu.sh gpujob_name.sh
- Edit the copy
gpujob_name.sh
(e.g. with emacs, vi).
- The parameters have the same meaning as in the BGQ debugjob job script described above.
- --nodes specifies the number of compute nodes to use. Each node has 2x10x8=160 CPU threads and 4 GPUs.
- --ntasks specifies the number of MPI processes. This must be set to 2n.
- --time specifies the wall clock limit.
- --gres=gpu:4 specifies to use 4 GPUs per node.
- PYTHON specifies the copy of Python to be used. The default Python installation on the SOSCIP GPU cluster cannot run the GPU version of the code, as ArrayFire is not installed on it.
- SCRIPT specifies the path of the Python script to run.
-
To select whether or not to use the GPUs, set the parameters use_gpu_Om2 and use_gpu_Om3 in
new_factorize_parallel.py
to True or False. If use_gpu_Om2 = True, all O(m^2) matrix operations will be performed on GPUs; if use_gpu_Om3 = True, all O(m^3) matrix operations will be performed on GPUs. It appears that the best performance is acheived when use_gpu_Om2 = False, and use_gpu_Om3 = True. -
Submit the job:
sbatch gpujob_name.sh
- Move results from scratch directory to desired location.
Results are stored in ./results/gate0_numblock_(n)_meff_(mx2)_offsetn_(offsetn)_offsetm_(offsetm)_uc.npy
.
The file ./results/gate0_numblock_(n)_meff_(mx2)_offsetn_(offsetn)_offsetm_(offsetm)_uc.npy
is a 1D NumPy array containing the flattened electric field in Fourier space. To convert this file to a 2D array, use the script unflatten_E.py
. The syntax is:
$ python unflatten_E.py offsetn offsetm n m
The correct way to reconstruct the electric field from the output of our deconvolution routine is unknown. The script unflatten_E.py
gets close to the correct answer in simple cases.
Currently, our decomposition can correctly compute the Cholesky factor of a block Toeplitz matrix. To see that this is true, set the option build_Inm to True in extract_realData2.py
. When this option is True, the extraction routine will extract as normal, but will also construct the entire block Toeplitz matrix Inm, and save it as Inm_input.npy
in the appropriate result folder. The extraction routine will also directly compute the Cholesky factor of the full block Toeplitz matrix Inm using np.linalg.cholesky
, and save it as L_input.npy
. Since the matrix Inm and its Cholesky factor are of size 4nm x 4nm, this is only possible for small n, m.
To reconstruct the entire Cholesky factor using our decomposition routine, set the option detailedSave to True in run_real_new.py
, then run the decomposition routine as normal. When this option is True, the decomposition routine will save full blocks of the Cholesky factor. Note that, in general, detailedSave should be False, as saving entire blocks slows down the code and consumes more disk space. After the decomposition routine completes, the full Cholesky factor can be reconstructed from the blocks using the script resconstruct_L.py
. The syntax is
$ python reconstruct_L.py offsetn offsetm n m
This will save our code's calculation of the Cholesky factor as L_result.npy
in the appropriate result folder. After these steps, you can compare the files L_input.npy
and L_result.npy
to check that our code computed the correct Cholesky factor.
While our code can correctly compute the Cholesky factor, we do not have a proven method to perform a 2D deconvolution. If this project is being continued, it is crucial that we achieive a proof of principle for our method. To do so, I strongly suggest the following: Create a script which generates an electric field in Fourier space E(tau,f_D); computes the corresponding intensity I(f,t); builds a complete block Toeplitz matrix Inm using this intensity; directly computes the Cholesky factor of Inm using, e.g., np.linalg.cholesky(); and retrieves the original electric field from the Cholesky factor.
There are a couples places in the method which are likely to be causing us problems:
-
We don't know exactly how the block Toeplitz matrix Inm should be constructed. It may be that we are making a mistake in the extraction routine which prevents the deconvolution from working correctly.
-
We don't know how the electric field should be reconstructed from the Cholesky factor. For example, the code currently constructs the electric field using the first row and column of the n right-most lower blocks of the Cholesky factor. Some tests suggested that the middle rows and columns of the n right-most blocks should be used, but using these does not fix the problem in general.