Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct Fourier transform prediction errors #34

Open
bennahugo opened this issue Sep 7, 2020 · 28 comments
Open

Direct Fourier transform prediction errors #34

bennahugo opened this issue Sep 7, 2020 · 28 comments

Comments

@bennahugo
Copy link
Contributor

bennahugo commented Sep 7, 2020

Hi @KimMcAlpine, @ludwigschwardt

I have done some preliminary comparisons with RARG's crystalball which is used in production continuum subtraction and has been verified against the Meqtrees simulator. I used your notebook invoking your DFT implementation to simulate model data and repack it into a measurement set column for storage and comparison. Notebook is available on request, but I will list the code below for reference discussion. The two simulations must match to machine precision, because they both use the SIN projection and UVW coordinate frame.

This simple comparison was needed after I spotted that the residual product formed between my CORRECTED_DATA and your limited model was offset and therefore not subtracting properly:
image
The minimum and maximum -7.2 and +6.9 Jy

To verify that this is not a model error I simplified the comparison to a single component. The two models I used are one to one matches with a single point source at the position of 1934, slightly off the pointing center (I think the 2019 September OPT coordinate frames still slightly off ?!) of an actual UHF observation.

# References:
# Flux model use wsclean format
# Fluxes and positions from B.Hugo model in Modelling of PKS B1934-638 (J193925.0-634245) as UHF and L-Band calibrator for MeerKAT
#
fake, radec, 19:39:25.02671, -63.42.45.6255, 544 1088 wsclean 5.0 [0 0 0 0 0] false 815867187.5

vs

Format = Name, Type, Ra, Dec, I, SpectralIndex, LogarithmicSI, ReferenceFrequency='815867187.5', MajorAxis, MinorAxis, Orientation
fake,POINT,19:39:25.02671,-63.42.45.6255,5.0,[0.0,0.0,0.0,0.0,0.0],False,815867187.5,,,

I then compare a number of samples from predicted visibilities. Something is clearly off int eh SP DFT (channel 1024 of 4096 and baseline m000&m005, flags from observed visibilities applied to both sets):
image
image

So as far as I can tell your DFT implementation is wrong even in the narrow field case and we will have to track down where the bug is within the software. The issue is, as far as I can tell not the models, although I should add that you should collapse the components of the central source into a single component for maximum precision as I requested per email initially, but that is a different issue. I can do that for you and upload it as another PR.

The notebook sections I list below for reference:

import katdal
import matplotlib.pyplot as plt
import numpy as np
import katpoint
from numba import jit, guvectorize
from katsdpcal.calprocs import get_bls_lookup, FluxDensityModel, K_ant, add_model_vis
from katsdpcal.calprocs_dask import wavg_full, bp_fit
from katsdpcal import pipelineprocs
import katsdpcal
import time
import dask.array as da
import glob
import os
from pyrap.tables import table as tbl
from pyrap.tables import taql
from pyrap.quanta import quantity
from datetime import datetime

f = katdal.open('1569150359_sdp_l0.full.rdb')

MSNAME = "1569150359_1934_XXYY_corrected_scan1.ms"
ms = tbl(MSNAME,readonly=False)
msant = tbl("{}::ANTENNA".format(MSNAME))
msspw = tbl("{}::SPECTRAL_WINDOW".format(MSNAME))
mspol = tbl("{}::POLARIZATION".format(MSNAME))

f.select()
print("Available targets:")
for t in f.catalogue.targets:
    print("\t {}".format(t.name))
print("Available tracking scans:")
tracking_scans = list(filter(lambda x: x[1] == "track", list(f.scans())))
for t in f.catalogue.targets:
    print("\t{}: ".format(t.name) + 
          ", ".join([str(s[0]) for s in filter(lambda x: x[2].name == t.name, tracking_scans)]))

fieldsel = "J1939-6342"
onlyscans = [1]#None # None for all

scansel = list(filter(lambda x: x[2].name == fieldsel, tracking_scans))

#MODEL_DIRECTORY = os.path.join(katsdpcal.__path__[0], "conf", "sky_models")
MODEL_DIRECTORY = "."
print("Available models:")
for m in glob.glob(os.path.join(MODEL_DIRECTORY,"*.txt")):
    print("\t{}".format(m))

# get model parameters, model params is a list of katpoint targets, or slightly modified katpoint targets
target = list(filter(lambda x: x.name == fieldsel, f.catalogue.targets))[0]
# The looks for a file which begins with the target name in the directory given in the 2nd arguement 
model_params, model_file = pipelineprocs.get_model(target, MODEL_DIRECTORY, 'u')
print("Using model file '{}'".format(model_file))

ms_scans = list(sorted(np.unique(ms.getcol("SCAN_NUMBER"))))
assert len(ms_scans) == len(scansel), "Number of scans in katdal database and ms must match"
katdal2ms_scan_map = dict(zip([s[0] for s in scansel], ms_scans))

def baseline_index(a1, a2, no_antennae):
    """
    Computes unique index of a baseline given antenna 1 and antenna 2
    (zero indexed) as input. The arrays may or may not contain
    auto-correlations.
    There is a quadratic series expression relating a1 and a2
    to a unique baseline index(can be found by the double difference
    method)
    Let slow_varying_index be S = min(a1, a2). The goal is to find
    the number of fast varying terms. As the slow
    varying terms increase these get fewer and fewer, because
    we only consider unique baselines and not the conjugate
    baselines)
    B = (-S ^ 2 + 2 * S *  # Ant + S) / 2 + diff between the
    slowest and fastest varying antenna
    :param a1: array of ANTENNA_1 ids
    :param a2: array of ANTENNA_2 ids
    :param no_antennae: number of antennae in the array
    :return: array of baseline ids
    """
    if a1.shape != a2.shape:
        raise ValueError("a1 and a2 must have the same shape!")
    slow_index = np.min(np.array([a1, a2]), axis=0)

    return (slow_index * (-slow_index + (2 * no_antennae + 1))) // 2 + \
            np.abs(a1 - a2)

def ants_from_baseline(bl, nant):
    """
    Given unique baseline index and number of antennae lookup the corresponding
    antenna indicies (zero-indexed).
    :param bl: unique baseline index
    :param nant: number of antennae
    :return: pair of antenna indicies corresponding to unique index
    """
    lmat = np.triu((np.cumsum(np.arange(nant)[None, :] >= np.arange(nant)[:, None]) - 1).reshape([nant, nant]))
    v = np.argwhere(lmat == bl)
    return v[0,0], v[0,1]

antennas = f.ants
antenna_names = [a.name for a in antennas]
array_position = katpoint.Antenna('', *antennas[0].ref_position_wgs84)
channel_freqs = f.freqs

antnames_ms = msant.getcol("NAME")
assert len(antenna_names) == len(antnames_ms) and antenna_names == antnames_ms, "katdal and ms antennas must match in ordering"
assert all(np.abs(msspw.getcol("CHAN_FREQ").ravel() - channel_freqs) < 1.0e-5), "katdal and ms channel ranges must match"
assert list(mspol.getcol("CORR_TYPE").ravel()) == [9,10,11,12], "MS must only have products XX, XY, YX, YY"
f.select()
for s in onlyscans if onlyscans is not None else [s[0] for s in scansel]:
    f.select(targets=fieldsel, scans=[s], pol='h')
    mssel = taql("select * from $ms where SCAN_NUMBER == {}".format(katdal2ms_scan_map[s]))

    timestamps = f.timestamps
    kd_delta_t = timestamps[1:] - timestamps[:-1]
    assert all(kd_delta_t > 0) and all(kd_delta_t - kd_delta_t[0] < 1e-4), \
           "Monotonically increasing katdal database needed"
    
    print("Verifying ms vs. katdal indexing for scan {} (ms scan {})... ".format(s, katdal2ms_scan_map[s]), end="")
    a1_ms = mssel.getcol("ANTENNA1")
    a2_ms = mssel.getcol("ANTENNA2")
    timestamps_ms = mssel.getcol("TIME")
    uvw_ms = mssel.getcol("UVW")
    flag_ms = mssel.getcol("FLAG")
    blindx_ms = baseline_index(a1_ms, a2_ms, len(antnames_ms))
    
    for bli in np.unique(blindx_ms):
        (bla1, bla2) = ants_from_baseline(bli, len(antnames_ms))
        blsel = np.logical_or(np.logical_and(a1_ms == bla1, a2_ms == bla2),
                              np.logical_and(a2_ms == bla1, a1_ms == bla2))
        ms_delta_t = timestamps_ms[blsel][1:] - timestamps_ms[blsel][:-1]
        assert all(ms_delta_t > 0) and all(ms_delta_t - ms_delta_t[0] < 1e-4), \
               "Monotonically increasing measurement set database needed"
        assert ms_delta_t[0] - kd_delta_t[0] < 1e-4, "katdal sampling rate must match ms sampling rate"
        mjd_0 = quantity("{}s".format(timestamps_ms[blsel][0])).to_unix_time()
        # starting time of scan must match that of ms, monotonically increasing at same rate
        assert np.abs(mjd_0 - timestamps[0]) < 1e-4, "katdal timestamps must match ms timestamps"
        assert timestamps.size == timestamps_ms[blsel].size, "katdal timestamps must match ms timestamps"
    print("<done>")
    
    print("Predicting scan {} (ms scan {})... ".format(s, katdal2ms_scan_map[s]), end="")

    bls_lookup = get_bls_lookup(antenna_names, f.corr_products)
    
    uvw = target.uvw(antennas, timestamps, array_position)
    uvw = np.array(uvw, np.float32)
    
    # set up model visibility    
    ntimes, nchans, nbls = f.vis.shape
    nants = len(f.ants)
    assert a1_ms.size == ntimes * nbls, "Number of baselines in ms and katdal dataset does not match"
    
    # currently model is the same for both polarisations
    # TODO: include polarisation in models
    k_ant = np.zeros((ntimes, nchans, nants), np.complex64)
    complexmodel = np.zeros((ntimes, nchans, nbls), np.complex64)
    complexmodel_ms = np.zeros((a1_ms.shape[0], nchans, 4), np.complex64)
    assert nbls == bls_lookup.shape[0]
    wl = katpoint.lightspeed / f.channel_freqs
    for ss in model_params:
        model_targets = katpoint.Target(','.join(ss.split(',')[:-1]))
        model_fluxes = FluxDensityModel(''.join(ss.split(',')[-1]))
        S=model_fluxes.flux_density(f.channel_freqs/1e6)

        l, m = target.sphere_to_plane(
             *model_targets.radec(), projection_type='SIN', coord_system='radec')
        k_ant = K_ant(uvw, l, m, wl, k_ant)
        complexmodel = add_model_vis(k_ant, bls_lookup[:, 0], bls_lookup[:, 1],
                                     S.astype(np.float32), complexmodel)
    
    for kda1, kda2 in np.unique(bls_lookup, axis=0):
        kdblsel = np.logical_or(np.logical_and(bls_lookup[:,0] == kda1,
                                               bls_lookup[:,1] == kda2),
                                np.logical_and(bls_lookup[:,0] == kda2,
                                               bls_lookup[:,1] == kda1))
        msblsel = np.logical_or(np.logical_and(a1_ms == kda1,
                                               a2_ms == kda2),
                                np.logical_and(a1_ms == kda2,
                                               a2_ms == kda1))
        # select all times of this baseline into row format 
        # both correlations the same, ie. I +/- Q:=0
        assert np.sum(kdblsel) == 1, "debug: one baseline must be selected"
        complexmodel_ms[msblsel,:,0] = complexmodel[:, :, kdblsel].reshape(ntimes, nchans)[:, :]
        complexmodel_ms[msblsel,:,3] = complexmodel[:, :, kdblsel].reshape(ntimes, nchans)[:, :]
    print("<done>")
    print("Writing model for ms scan {}... ".format(katdal2ms_scan_map[s]), end="")
    mssel.putcol("MODEL_DATA", complexmodel_ms)
    print("<done>")
@KimMcAlpine
Copy link
Contributor

KimMcAlpine commented Sep 7, 2020 via email

@bennahugo
Copy link
Contributor Author

Hi Kim:
com08:/data2/bhugo/projects/projects/calibrator_paper/UHF/1569150359_1934_XXYY_corrected_scan1.ms
com08:/data2/bhugo/projects/projects/calibrator_paper/UHF/Complexmodel.ipynb
CASA corrected visibilities are in DATA

@bennahugo
Copy link
Contributor Author

@marisageyer @sbuchner as discussed until this issue is resolved and the DFT is properly tested I highly recommend that the codepath be disabled in phaseups as it will induce decoherency by artificially shifting the beam around on both L and UHF band beamformer phaseup and cause,by the looks of it, astrometrical errors on calibrated imaging data produced by the SP pipeline.

@ludwigschwardt is there a way to quickly kill this code path until resolution?

@ludwigschwardt
Copy link
Contributor

@KimMcAlpine would know best - maybe go back to the old single component model which doesn't need a DFT?

@KimMcAlpine
Copy link
Contributor

KimMcAlpine commented Sep 8, 2020 via email

@marisageyer
Copy link

marisageyer commented Sep 10, 2020

Hi @ludwigschwardt and @KimMcAlpine - quick check: while the multi-component models were introduced at the end of June (28 June), do we know whether this DFT bug that @bennahugo describes necessarily would have been introduced at the same time? Or could it be that even pre the update multi-component models such a bug (leading to an offset between model and source) exists?

-- answering myself: from above comments its clear the single comp models don't need a DFT.

@KimMcAlpine
Copy link
Contributor

@bennahugo I think I have discovered the reason for the discrepancy in the plots above. It was a bit confusing initially because I thought you were plotting samples for one baseline, but I suspect that maybe you are plotting a subsample of the baselines at some timestamps.

I have compared my SP model with the 'corrected data column' in the indicated MS
com08:/data2/bhugo/projects/projects/calibrator_paper/UHF/1569150359_1934_XXYY_corrected_scan1.ms.

I think the problem is with your model string which you provide as being:
"fake, radec, 19:39:25.02671, -63.42.45.6255, 544 1088 wsclean 5.0 [0 0 0 0 0] false 815867187.5"
katpoint doesn't read the CASA declination format as can be evidenced by trying:

model_string = "fake, radec, 19:39:25.02671, -63.42.45.6255, 544 1088 wsclean 5.0 [0 0 0 0 0] false 815867187.5"
model_targets = katpoint.Target(','.join(model_string.split(',')[:-1]))
print(model_targets)
fake, tags=radec, 19:39:25.03 -63:25:39.6, no flux info

If I use this string to form the complex model and compare the SP predicted model and the corrected_data column in the MS for baseline m000-m005 at channel 1024 for the first scan I get the following plot:

Wrong_dec_model

If I plot a subsample of baselines at timestamp 10 I get:
Wrong_dec_baselines

If I modify the string to be:
"fake, radec, 19:39:25.02671, -63:42:45.6255, 544 1088 wsclean 5.0 [0 0 0 0 0] false 815867187.5"
then the plots look as below:

Correct_dec_model
Correct_dec_baselines

As I'm not sure exactly which samples you plotted before I can't compare directly, but this looks like its the most plausible reason for the discrepancy in the latest test. The comparison between SDP and wsclean looks good to be here to me in the narrow field. Let me know what kind of tests you want to proceed with from here. We obviously still need to deal with the phase offset from your earlier test.

@tony2heads
Copy link

so this is just a question of ":" in stead of "." in the declination?

@KimMcAlpine
Copy link
Contributor

KimMcAlpine commented Sep 10, 2020 via email

@bennahugo
Copy link
Contributor Author

bennahugo commented Sep 14, 2020

Okay thanks @KimMcAlpine. You are correct I forgot to apply the selection mentioned. When I changed the coordinates as mentioned and increased the offset to about a degree from phase centre I notice larger offsets between the two implementations on the long spacings:
image
compared to the short spacings which should be ok
image

Now I think the errors on the long spacings are too big to safely ignore - the error in the absolute anglular difference is nearly 3 arcsecs at the 75th percentile, while the power is almost spot on. I think this is responsible for the subtraction differences. So there is a systemic error in the phase which could stem from the uv coordinates (which it should not - both are ultimately generated by katpoint) or an error in the wavelengths applied. Are we sure that the cosines are computed in double precision?

@KimMcAlpine
Copy link
Contributor

KimMcAlpine commented Sep 14, 2020 via email

@bennahugo
Copy link
Contributor Author

Same MS, with an offset position to the south of the phase centre

  1 # References:
  2 # Flux model use wsclean format
  3 # Fluxes and positions from B.Hugo model in Modelling of PKS B1934-638 (J193925.0-634245) as UHF and L-Band calibrator for MeerKAT
  4 #
  5 fake, radec, 19:39:25.02671, -64:42:45.6255, 544 1088 wsclean 5.0 [0 0 0 0 0] false 815867187.5

@bennahugo
Copy link
Contributor Author

Could you copy the MS? I'm trying to further check the dtypes being used and the coordinates against that generated by astropy

@bennahugo
Copy link
Contributor Author

I've checked that the spherical coordinates are spot on between the way we generate them and the way they are generated in katpoint:
image
So I believe the issue is in the uv coordinates

@bennahugo
Copy link
Contributor Author

@KimMcAlpine I believe I've tracked down the issue to some extent. There is a difference between the uvw coordinates you get out of katpoint and the ones that are in the MS via katdal.
image
When I change the precision to 64 bit (which it really should be if you intend to plug these into the complex exponential) the difference is 0
image

Now the question is what are you doing in the sp pipeline @ludwigschwardt ?

@bennahugo
Copy link
Contributor Author

Yea that was the bastard @KimMcAlpine. Can you please verify that this is done in full precision in the actual pipeline?
image

@ludwigschwardt
Copy link
Contributor

When I change the precision to 64 bit (which it really should be if you intend to plug these into the complex exponential) the difference is 0

Interesting... Because the precision is 64-bit at the start.

In [57]: f = katdal.open('.....rdb')                                              
In [58]: target = f.catalogue.targets[0]                                                                       
In [59]: uvw = target.uvw(f.ants, f.timestamps, f.ants[0].array_reference_antenna())                           
In [60]: uvw[0].dtype                                                                                          
Out[60]: dtype('float64')

I'll dig out the relevant dataset (1569150359) and check katdal vs katpoint UVWs, first leaving MS out of the loop to reduce the number of factors to consider.

@KimMcAlpine
Copy link
Contributor

In the pipeline I'm using 32-bit precision. I probably did this to speed things up a bit, the code in the pipeline is given below:

uvw = self.target.uvw(self.antennas, self.timestamps, self.array_position)
self.uvw = np.array(uvw, np.float32)

@bennahugo
Copy link
Contributor Author

bennahugo commented Sep 14, 2020

Aside from memory allocation and setting, speed should not be an issue here (at least not on CPUs). As mentioned this causes arcsecond level errors and should be changed.

@ludwigschwardt
Copy link
Contributor

ludwigschwardt commented Sep 14, 2020

@bennahugo, in the Python example you posted above, how did you generate the uvw variable used in cell [70]? I had assumed it was similar to target.uvw in cell [71].

@bennahugo
Copy link
Contributor Author

That was cast to np.float32 in the array @ludwigschwardt as @KimMcAlpine has above.

@ludwigschwardt
Copy link
Contributor

Right, that makes more sense now. The difference is 0.0002 metres, or 0.2 mm, though. On an 8000 m baseline this amounts to 5 milliarcseconds, a few orders below the 3 arcsecond position errors you mention.

@bennahugo
Copy link
Contributor Author

@ludwigschwardt this is a degree away at phase centre (at delta_ra=0). According to my calculation the phase differences, at the 75th percentile is around the 2" mark so it is not far off the level of errors that we are seeing if you look at the first plot
image

@ludwigschwardt
Copy link
Contributor

You mention a 2-3 arcsecond phase difference. Is this a difference in the phase angle of the calibrated visibilities of the calibrator as produced by two different methods, or a position offset on the sky?

If the former, it would be informative to express this error as a noise RMS, especially since your plots suggest that the mean phase residual is zero and the phase difference is in fact the maximum deviation of the residual noise. If mostly due to float32 usage we can think of this as quantisation noise. We can then compare this to the expected noise level to see if it is excessive.

If the latter, I'd be more surprised. There is a known issue with katpoint astrometric accuracy at the 2 arcsecond level, but this would manifest itself quite far away from the calibrator (30 degrees+).

@bennahugo
Copy link
Contributor Author

it is the phase angle difference of your predicted model visibilities vs our predicted model visibilities on a source placed at an offset position @ludwigschwardt. The error is solely due to quantization - without casting down the error on short and long spacings is far more comparable than when the decomposed uvw coordinates are quantized before combining into a baseline uvw pair.

I think the problem with exposing this as noise RMS is that it becomes sky dependent: dependent on how bright the source is and how far off axis it is. Surely changing the dtype here will have no effect aside from avoiding running back into this issue as we extend the baselines even further. I would understand the argument from a performance point of view if we were talking about a cuda / opencl based dft, but this is not the case here?

@ludwigschwardt
Copy link
Contributor

I agree that there is no real need for 32-bit precision and we should go back to full precision.

I'm just curious to understand the size of this effect. As I see it this will cause a slight increase in the noise floor, but how slight is the question :-)

@marisageyer
Copy link

@KimMcAlpine are you planning to implement this change in precision? Thereafter @bennahugo can check that the phase offsets between the two sets of tools look good on both long and short baselines.

@KimMcAlpine
Copy link
Contributor

KimMcAlpine commented Sep 17, 2020 via email

KimMcAlpine pushed a commit that referenced this issue Sep 17, 2020
During the model prediction the uvw co-ordinates were being cast to
np.float32 precision which causes phase errors on sources situation some
distance from the phase centre as discussed in
#34.

This PR restores the uvw co-ordinates to their native np.float64
precision.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants