Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-station spectrogram #209

Open
kkappler opened this issue Jun 8, 2024 · 2 comments
Open

multi-station spectrogram #209

kkappler opened this issue Jun 8, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@kkappler
Copy link
Collaborator

kkappler commented Jun 8, 2024

Multi-station time series of Fourier coefficients (FCs) are central to several processing methodologies, such as those exposed in Egbert 1997, or Kappler et al. 2010. MTH5 provides the data container, but does not yet provide a mechanism by which these time series can be extracted into an xarray for covariance calculations when more than one station is involved. Robust estimation of the spectral covariance matrix would seem most easily done if we could call chunks of contiguous data across multiple stations.

I propose we create a method to return as an xarray, time series of spectrograms from mixed stations.

There are several complications associated with creating multi-station time series of FCs. Here I try to summarize them, and describe a work around for most of them, and suggest and provide an approach that can be reused in a reasonably general way.

The primary challenges when merging a bunch of station time series' are

 1. The data may not be synchronized --
 2. The runs may not overlap, or may overlap in a complex way

Note that for "True Multi-Station" applications (3 or more stations) both of these problems compound.

The first problem can probably be pushed back to the time series level, so that the data are resampled to a common time base. It is also possible that we could interpolate the FCs, but I have not tried.

The second problem can be handled by only requesting overlapping times. In general, we can always break the problem for two stations in to simple cases of overlap. When a third station is involved is where confusion can set in. Forgetting the resampling for a moment, we are confronted with the possibility of having, a data coverage situation for example:

|--------------------> Time Increasing ----------------> 

|----------Station 1 ------------|
        |----------Station 2 ------------|
|--------------------Station 3 ----------------------|

In this case, the overlap between all stations is something like:
        |-------ALL OVERLAP----|

We can return the "ALL OVERLAP" array, but we are leaving data on the table. FCs from stations 1 and 3 are available at the beginning, which could potentially improve estimates of <STN1,STN3> cross-powers in the Spectral Density Matrix (SDM). Also, there are FCs available from stations 1 and 2 after 1 has stopped recording. Perhaps there is a clean way to leverage these additional information and this should be revisited later, but for now, I propose we use an "ALL OVERLAP" approach.

A final complication would occur if we allowed one or more stations in the mix to specify multiple runs. This can be ignored here if we handle additional runs in a separate call to this function and then concatenate time series afterwards.

Thus, the input must specify some number N of (station-run-start-end-channel_list) tuples. If channel_list is not provided, get all channels. If start-end are not provided, read the whole run -- warn if runs are not all synchronous, and truncate all to max(starts), min(ends) after the start and end times are sorted out.

As each stations spectrogram is retrieved, a method to rename the datavars in the xarray from the existing channel name to the concatenation of f"{station_id}_{channel_id} can be called so there is no namespace clash in the xarray.

kkappler added a commit that referenced this issue Jun 8, 2024
@kkappler kkappler mentioned this issue Jun 8, 2024
2 tasks
@kkappler kkappler self-assigned this Jun 8, 2024
@kkappler kkappler added the enhancement New feature or request label Jun 8, 2024
kkappler added a commit that referenced this issue Jul 5, 2024
- these tests actually relate to #209
- added some doc as well
@kkappler
Copy link
Collaborator Author

Prototype for this is working. Could benefit from #212

kkappler added a commit that referenced this issue Jul 24, 2024
- a sort of continuation of issue #209, but a new fork from patches as that branch is already merged
- add MultivariateLabelScheme() class to manage how we label multivariate channels
- add MultivariateDataset() class to wrap the MV xarray
- add some tests
- tests could be better organized if address issue #227
kkappler added a commit that referenced this issue Aug 14, 2024
Add some more multivariate functionality


There may be some more methods needed for issue #209, but these tools seem to do most of what is needed, and tests are passing so merging into `patches`.
@kkappler
Copy link
Collaborator Author

This functionality is coming along, however, more work is needed.

  • The FCRunChunk object should be moved to mt_metadata, as it is a descriptor of a chunk of FC time series.

@kkappler kkappler reopened this Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant