-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Baseline information #159
Comments
Hi @miguelcarcamov, I think it depends on how far you want to take it. 1. The easy, but probably less efficient approachYou could set up your datasets with the following: datasets = xds_from_ms(ms, group_cols=["FIELD_ID", "DATA_DESC_ID", "ANTENNA1", "ANTENNA2"]) This will create a unique dataset per combination of FIELD_ID, DATA_DESC_ID and BASELINE. Unfortunately, Measurement Sets are frequently monotonically ordered in TIME, rather than ANTENNA1, ANTENNA2, so the resulting datasets will be backed by reads of non-contiguous rows, which results in inefficient disk read patterns. But it should be fairly easy to calculate the max baseline length per dataset as follows: import dask
import dask.array
datasets = xds_from_ms(ms, group_cols=["FIELD_ID", "DATA_DESC_ID", "ANTENNA1", "ANTENNA2"])
bl_lengths = []
for ds in datasets:
ant1 = da.full_like(ds.TIME.data, ds.ANTENNA1, dtype=np.int32)
ant2 = da.full_like(ds.TIME.data, ds.ANTENNA2, dtype=np.int32)
bl_lengths.append(da.sqrt((ds.UVW.data[ant2, : ] - ds.UVW.data[ant1, :])**2).max())
dask.compute(bl_lengths) If you want to do more with the baseline length (i.e. process visibility data), then non-contiguous disk access will hurt performance. 2. The harder, but more efficient approachA second approach requires (1) some knowledge of dask internals (2) the ability to process your baseline data on a per-chunk basis. from __future__ import print_function
import argparse
import dask
import dask.array as da
from daskms import xds_from_ms
import numpy as np
def create_parser():
p = argparse.ArgumentParser()
p.add_argument("ms")
return p
def _process(ant1, ant2, uvw):
uvw = uvw[0] # Contraction over the uvw3 axis
# Identify unique baselines in this chunk
baselines = np.stack((ant1, ant2), axis=1)
ubl, inv = np.unique(baselines, return_inverse=True, axis=0)
# Determine their lengths
bl_length = np.empty(ubl.shape[0], dtype=uvw.dtype)
for i, (a1, a2) in enumerate(ubl):
bl_length[i] = np.sqrt(uvw[i == inv, :]**2).max()
print(bl_length)
# Further processing required beyond this point
if __name__ == "__main__":
args = create_parser().parse_args()
ds = xds_from_ms(args.ms)
ds = ds[0] # Just demonstrate on the first dataset
# Map _process function on input arrays to produce an output arrow
# A good understanding of dask.array.blockwise is advised
process = da.blockwise(_process, ("row",),
ds.ANTENNA1.data, ("row",),
ds.ANTENNA2.data, ("row",),
ds.UVW.data, ("row", "uvw3"),
concatenate=False,
meta=np.empty((), np.object))
dask.compute(process) ConclusionI suspect the approach you take will depend on whether you want to crunch the larger visibility data. What are your thoughts?
|
I ended up using itertools.combinations. Although since I am very new on using dask it might be less efficient than your approach. I would like you to tell me what you think. antennas = xds_from_table(self.ms_name_dask + "ANTENNA", taql_where=taql_query)[0]
antenna_obj = Antenna(dataset=antennas) When creating the object antennas it runs this: self.max_diameter = 0.0 * u.m
self.min_diameter = 0.0 * u.m
if dataset is not None:
self.max_diameter = self.dataset.DISH_DIAMETER.data.max().compute() * u.m
self.min_diameter = self.dataset.DISH_DIAMETER.data.min().compute() * u.m Then I run: # Creating baseline object
baseline_obj = antenna_obj.create_baseline_dataset() This function runs: def create_baseline_dataset(self):
ids = self.dataset.ROWID.data.compute()
combs = np.array(list(combinations(ids, 2)))
antenna1 = self.dataset.sel(row=combs[:, 0])
antenna2 = self.dataset.sel(row=combs[:, 1])
baseline = antenna1.POSITION - antenna2.POSITION
baseline_length = xarrfunc.sqrt(
xarrfunc.square(baseline[:, 0]) + xarrfunc.square(baseline[:, 1]) + xarrfunc.square(baseline[:, 2]))
baseline_length = baseline_length.data.persist()
row_id = np.arange(len(combs[:, 0]))
ant1_id = da.from_array(combs[:, 0])
ant2_id = da.from_array(combs[:, 1])
row_id = da.from_array(row_id)
ds = xarray.Dataset(
data_vars=dict(
ANTENNA1=(["row"], ant1_id),
ANTENNA2=(["row"], ant2_id),
BASELINE_LENGTH=(["row"], baseline_length)
),
coords=dict(
ROWID=(["row"], row_id)
))
return Baseline(dataset=ds) Since the baseline lengths are in a xarray dataset we can get the maximum using: self.max_baseline = self.dataset.BASELINE_LENGTH.max().data.compute() * u.m
self.min_baseline = self.dataset.BASELINE_LENGTH.min().data.compute() * u.m Let me know if this is not efficient, I would like to use the blockwise function though Cheers |
@sjperkins Ok, I have tested your code and the only downside is that the dask array returned from process is bigger than what we should expect. For example, if we are returning an array of dimensions for the baselines, like (id, antenna1_id, antenna2_id) if we pass row as the first dimension we would end up with a much more bigger dask array. Btw, what do you mean with crunching the visibility data? Well, I would like two things - One of them I have seen it as an issue - which is have antenna1 and antenna2 + baseline_id as a coordinate in the datasets. But also I would like to loop my datasets per baselines and work on each one of them. My idea is to make a function that takes a non-gridded datasets and returns a gridded dataset. For that we need to do the gridding for each field, spw and baseline, so all the ids in the main table fit. |
A follow up to this @sjperkins: I've seen the documentation of CASA ngi, and I was wondering how they get to order their data by baseline if the data is not contiguous by baseline... If you convert the data to zarr then you don't get any problem ordering the data by baseline? |
I don't want to speak too much for the casangi team, but it looks like they enforce a The MSv2.0 (and Ms3.0) spec specifies a Thus, in my opinion, enforcing a I've also only mentioned the |
@sjperkins right, makes sense. Although I think that for self-calibration which can be considered as calibration+imaging ordering by However, what I want to do is this: let's say I calculate a baseline_id for each row in my dask-ms dataset which has already been grouped by
Cheers |
It may be possible to do this via xarray groupby but I'm wary of this approach since it'll create a dask graph for each group (baseline) with a lot of cross-communication between chunks. I think this'll work but will either require:
Having said that I haven't tried this approach in a long time, so the underlying functionality might have improved.
One can't really get around this issue, regardless of the storage backend: its a matter of Spatial Locality. If I were to use database terminology, accessing data on the primary key is always more optimal than accessing data via a secondary key because data is usually ordered by primary key on disk. You might want to try reordering your MS as follows: dask-ms convert ~/data/input.ms -g "FIELD_ID,DATA_DESC_ID,SCAN_NUMBER" -i "ANTENNA1,ANTENNA2,TIME,FEED1,FEED2" -o ~/data/output.ms --format ms --force If you've created a |
Thank you very much @sjperkins. I will try what you have suggested and I will let you know. Last question - Is the convert function part of the dask-ms? That is, can I call it as a function from a python file? Cheers |
Note there were some fixes pushed to master this morning, but I don't think there would have been an issue with MS to MS conversion.
It's a class in |
@sjperkins Hi again! quick question - How can I use convert from a piece of code directly with the |
I'd just instantiate Convert with the relevant command line arguments and a python logger. Something like the following (I haven't run this!) import logging
log = logging.getLogger(__file__)
args = ["input.ms", "--output", "output.ms", "--group-cols", "FIELD_ID,DATA_DESC_ID", "--index-cols", "TIME,ANTENNA1,ANTENNA2"]
convert = Convert(args, log)
convert.execute() |
I'm getting this error when running the command @sjperkins : 2022-11-02 10:34:29,585 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21032324 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,592 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21037373 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,597 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21037373 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,602 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21042422 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,607 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21042422 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,613 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21047471 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,617 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21047471 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,624 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21088424 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,629 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21088424 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,634 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21092912 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,640 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21092912 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,646 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21097400 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,651 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21097400 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,657 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21101888 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,661 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21101888 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,666 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21160232 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,672 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21160232 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,678 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21164720 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,683 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21164720 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,689 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21169208 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,694 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21169208 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,699 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21173696 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,704 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21173696 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,711 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21232040 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,716 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21232040 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,721 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21236528 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,726 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21236528 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,732 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21241016 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,736 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21241016 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,743 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21245504 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,748 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21245504 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,753 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21303848 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,758 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21303848 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,764 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21308336 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,769 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21308336 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,775 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21312824 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,780 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21312824 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:29,786 - dask-ms - WARNING - Ignoring 'WEIGHT_SPECTRUM': Unable to infer shape of column 'WEIGHT_SPECTRUM' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21317312 of column WEIGHT_SPECTRUM in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f19'
2022-11-02 10:34:29,792 - dask-ms - WARNING - Ignoring 'FLAG_CATEGORY': Unable to infer shape of column 'FLAG_CATEGORY' due to:
'Table DataManager error: Invalid operation: TSM: no array in row 21317312 of column FLAG_CATEGORY in /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg/table.f18'
2022-11-02 10:34:31,247 - dask-ms - INFO - Input: 'measurementset' file:///home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg
2022-11-02 10:34:31,247 - dask-ms - INFO - Output: 'measurementset' file:///home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg_time
2022-11-02 10:35:21,005 - dask-ms - WARNING - The shape of column 'ASSOC_SPW_ID' is unconstrained (ndim == -1). Assuming shape is (31,) from exemplar
2022-11-02 10:35:21,011 - dask-ms - WARNING - The shape of column 'ASSOC_NATURE' is unconstrained (ndim == -1). Assuming shape is (31,) from exemplar
2022-11-02 10:35:21,015 - dask-ms - WARNING - The shape of column 'ASSOC_SPW_ID' is unconstrained (ndim == -1). Assuming shape is (31,) from exemplar
2022-11-02 10:35:21,021 - dask-ms - WARNING - The shape of column 'ASSOC_NATURE' is unconstrained (ndim == -1). Assuming shape is (31,) from exemplar
2022-11-02 10:35:21,025 - dask-ms - WARNING - The shape of column 'ASSOC_SPW_ID' is unconstrained (ndim == -1). Assuming shape is (31,) from exemplar
2022-11-02 10:35:21,031 - dask-ms - WARNING - The shape of column 'ASSOC_NATURE' is unconstrained (ndim == -1). Assuming shape is (31,) from exemplar
2022-11-02 10:35:21,036 - dask-ms - WARNING - The shape of column 'ASSOC_SPW_ID' is unconstrained (ndim == -1). Assuming shape is (31,) from exemplar
2022-11-02 10:35:21,042 - dask-ms - WARNING - The shape of column 'ASSOC_NATURE' is unconstrained (ndim == -1). Assuming shape is (31,) from exemplar
2022-11-02 10:35:21,521 - dask-ms - WARNING - Ignoring SOURCE
2022-11-02 10:35:21,525 - dask-ms - WARNING - Ignoring 'TARGET': Unable to infer shape of column 'TARGET' due to:
'TableProxy::getCell: no such row'
2022-11-02 10:35:21,526 - dask-ms - WARNING - Ignoring 'ENCODER': Unable to infer shape of column 'ENCODER' due to:
'TableProxy::getCell: no such row'
2022-11-02 10:35:21,527 - dask-ms - WARNING - Ignoring 'POINTING_OFFSET': Unable to infer shape of column 'POINTING_OFFSET' due to:
'TableProxy::getCell: no such row'
2022-11-02 10:35:21,527 - dask-ms - WARNING - Ignoring 'DIRECTION': Unable to infer shape of column 'DIRECTION' due to:
'TableProxy::getCell: no such row'
Traceback (most recent call last):
File "/home/vicente/anaconda3/envs/pyralysis2/bin/dask-ms", line 8, in <module>
sys.exit(main())
File "/home/vicente/anaconda3/envs/pyralysis2/lib/python3.8/site-packages/daskms/apps/entrypoint.py", line 9, in main
return EntryPoint(sys.argv[1:]).execute()
File "/home/vicente/anaconda3/envs/pyralysis2/lib/python3.8/site-packages/daskms/apps/entrypoint.py", line 33, in execute
cmd.execute()
File "/home/vicente/anaconda3/envs/pyralysis2/lib/python3.8/site-packages/daskms/apps/convert.py", line 415, in execute
writes = self.convert_table(self.args)
File "/home/vicente/anaconda3/envs/pyralysis2/lib/python3.8/site-packages/daskms/apps/convert.py", line 500, in convert_table
writes.append(writer(datasets, out_store))
File "/home/vicente/anaconda3/envs/pyralysis2/lib/python3.8/site-packages/daskms/dask_ms.py", line 102, in xds_to_table
out_ds = write_datasets(
File "/home/vicente/anaconda3/envs/pyralysis2/lib/python3.8/site-packages/daskms/writes.py", line 760, in write_datasets
tp = _updated_table(table, datasets, columns, descriptor)
File "/home/vicente/anaconda3/envs/pyralysis2/lib/python3.8/site-packages/daskms/writes.py", line 338, in _updated_table
table_proxy.addcols(_table_desc, dminfo=_dminfo).result()
File "/home/vicente/anaconda3/envs/pyralysis2/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/home/vicente/anaconda3/envs/pyralysis2/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/home/vicente/anaconda3/envs/pyralysis2/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/vicente/anaconda3/envs/pyralysis2/lib/python3.8/site-packages/daskms/table_proxy.py", line 114, in _impl
return getattr(table, method)(*args, **kwargs)
File "/home/vicente/anaconda3/envs/pyralysis2/lib/python3.8/site-packages/casacore/tables/table.py", line 1226, in addcols
self._addcols(tdesc, dminfo, addtoparent)
RuntimeError: Invalid Table operation: Data manager name StandardStMan is already used in table /home/vicente/Documentos/Ayudantia/complete_data/HLTau_B6cont.calavg_time/POINTING |
I'm worried that this is not working and that casang can re-order their xarray dataset by (time,baseline). I have noticed that this has a very high impact at least for self-calibration. |
Actually I think this ordering is possibly only good for calibration itself. For imaging one would need to repack by baseline x time instead (like wsclean does when it reorders by w or when ddfacet computes bda ordering). Typically imaging takes a lot longer than the calibration routines so I wonder if it should not be packed like that instead? |
@bennahugo Yes, ordering time, baseline is only good for calibration. For imaging the best ordering is baseline, time. I agree. Here is where self-cal enters and it needs both ordering - time, baseline when calibrating and baseline, time when imaging. Given that I'm developing software that will do both, my idea would be to re-order the dataset given what the code is doing (calibration, imaging, self-cal (needs both)). However, the convert script is not able to do that as you can see above, so I haven't been able to test anything at the moment. |
I can't tell exactly what's happening from your stack trace. Which command line arguments are you using? It looks like you're writing to an existing table due to the call to |
As discussed earlier in #159 (comment), we don't impose specific orderings on data because different applications benefit from different orderings. It's the user's responsibility to reorder their dataset into a format that is convenient for their application. This is possible via |
Maybe if I add the link to the ms here you can traceback the error? The command line that I'm currently using is: dask-ms convert HLTau_B6cont.calavg.tav300s -g "FIELD_ID,DATA_DESC_ID,SCAN_NUMBER" -i "ANTENNA1,ANTENNA2,TIME,FEED1,FEED2" -o output.ms --format ms --force I'm not creating any folder before that. |
Thanks for the linked MS. I can reproduce this error on my side. I'll try block off some time to look at the issue this week. |
Description
Hello everyone, I would like to partition or group my ms dataset based on FIELD_ID, DATA_DESC_ID and BASELINE (which is not a column, but can be calculated using ANTENNA1 and ANTENNA2). It is possible to do this? Also, for each of the baselines I would like to get the length of them. However, for that we would need to do a query for the entire dataset instead of the list of partitions.
Anyone know how to do this?
This library is awesome, keep the good work, best regards!
The text was updated successfully, but these errors were encountered: