-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flat table format for VPTS #25
Comments
@adokter recently we noticed in amsterdam that vol2bird changes the name of the reflectivity column if you change the input you select. e.g. |
Hi @bart1 good catch, hadn't realized that. This only applies to the original I agree the easiest fix would be to change the name of that column to something constant, that way we can get rid of the final capitalized column name as well. We can make it |
Added issue adokter/vol2bird#188 |
Thanks @adokter ! I will also try to then use dbz_all for the naming in other code |
Hi all, I'm communicating this with Johannes De Groeve to ensure we are using identical naming in the VP DB. For now, we had chosen to rename the column to |
@BerendWijers the idea to call it |
@peterdesmet Thank you! |
Any more feedback on this format or can we start describing this format and implementing it in the ENRAM data repository? |
@peterdesmet I think most data is contained. Personally I try to avoid too much duplication (last few columns) but I see it is also the elegance of using csv's. There are maybe three things to think about that I see directly
I'm not sure if it is worth changing this but that is what i directly could see |
I think we should rename |
Some duplicate info is ok from my perspective, especially with direction, because it helps to enforce one convention for defining the angle. Also, for height-integrated vpi quantities it's no longer the case that ff and dd can be derived directly from u and v, so there we have to keep both. So for similarity we might keep them also in vpts |
height is currently defined by the bottom of the altitude bin, and that's not very intuitive. I could change vol2bird to output to center of the altitude bin instead, if we have a mechanism to track the versions of old/new data - what do you think @peterdesmet? It's a bit of a hassle to change in bioRad as there we add half of the height bin size everywhere in height calculations |
I have renamed @adokter regarding middle or bottom of altitude bins:
Personally, I found the bottom (start) of the altitude bin rather intuitive, so I would not change it if we can avoid it. @bart1 your suggestion to indicate the bottom and top of the height bin would indeed make it explicit that we're talking about a bottom and top and allow people to get the height of a bin from a single row. Same for timestamp. I'm just curious if seeing the increasing height and timestamp over multiple rows was ever a source of confusion? Does it warrant adding adding one or both columns? |
I have not really been confused, although I do not know if height integration accounts for half bins to the surface of the earth (I' don't think so). I brought is more up as, I do feel there is room for improvement that not necessarily needs to happen but would make work better towards the future. I think for it is more confusion can occur about the timestamps. require(bioRad)
#> Loading required package: bioRad
#> Welcome to bioRad version 0.5.2.9499
#> Docker daemon running, Docker functionality enabled (vol2bird version 0.5.0)
ts <- example_vpts[300:302]
# plot density of individuals for the first 500 time steps, in the altitude
# layer 0-3000 m.
plot(ts, ylim = c(0, 3000))
#> Warning in plot.vpts(ts, ylim = c(0, 3000)): Irregular time-series: missing
#> profiles will not be visible. Use 'regularize_vpts' to make time series regular. ts$datetime
#> [1] "2016-09-02 02:40:00 UTC" "2016-09-02 02:50:00 UTC"
#> [3] "2016-09-02 02:59:00 UTC" This 5 minute shift is generally not big deal but it would be good to get it right. Note that if not all pvols are analyzed the original duration of a scan/pvol also can't be reconstructed from vp data. Retaining every second pvol is not uncommon, when different scanning patterns do occur or when people want to save data volumes (eg at uva we have 2 years of german data only every 3 pvol so the duration of a pvol is about 5 minutes but we only have a measurement every 15 minutes). |
PS for this example data, I would not be able to identify backward (from the vpts) if the full pvol is scanned every 5 minutes and thus a 2.5 minutes shift would place it in the center of the pvol or if a full pvol takes 10 minutes and a 5 minutes shift would be correct. |
@bart1 Thanks! So, would your use case be solved if start and end timestamp were included in the tabular data? |
I think so then at least the information for finding the median timestamp is available. @adokter what do you think, if you are treating the vp's as point measurements in time is the median timestamp the best? |
You would calculate median timestamp, but we are suggesting to add both columns right (start and end), not provide a single column with median timestamp? |
The met office typically assign a nominal time to radar polar volumes, and I think we should stick to that convention for simplicity, even though different countries might have different conventions. That nominal is typically also in the filename, and recalculating a time ourselves gets rather complicated I feel.. |
The same, it's defined by vol2bird, not by the data
see above, I think it might vary from met office to met office
ok - I have no problem with that |
@adokter, ok, so if I understand:
@adokter @bart1 Is there consensus on adding the upper bin height? I have no preference. If we add one, how do we name the fields? |
@peterdesmet vol2bird isn't aware of the sampling interval, it can only be determined after the fact when you have a time series of profiles. Here is how the ODIM format defines nominal time in https://www.eumetnet.eu/wp-content/uploads/2019/01/ODIM_H5_v23.pdf:
We could stick to that with a ref to ODIM, although admittedly it isn't very clear, I suspect because of subtle differences in how countries define it. So @bart1 is right that we could improve, we might be able to extract the start and end acquisition time from sweep-specific meta-data, but the problem is that these metadata isn't mandatory like the nominal time.
In bioRad function To keep it simple I would stick to nominal time and stick to 'irregular time series of vertical profiles', the regularization on time-grid, and interpretation of what nominal time is, can then be left to the user. I would vote for |
Question: should any VPTS data contain all the columns presented above, or is it ok if some columns are omitted? |
The format is now described at https://enram.github.io/vpts/format/ Please review and create issues for items that are unclear or need change. Input is also welcome on the issues labelled help wanted. |
@adokter and I discussed the VPTS format today and we suggest a flat table format that contains all necessary data for analysis. I'll describe the format in more detail later, but here is how you could reproduce it. The written file is this one: example_vpts.csv
Created on 2022-03-09 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered: