-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data fetcher "Coriolis" #47
Comments
First thoughts and tests on metadata fetching : From python, I access the api with simple POST request : import json, requests, pandas as pd
StringJson = r'{"criteriaList":[{"field":"platformCode" [...] }'
DataJson = json.loads(StringJson)
url = 'http://blp_test.ifremer.fr/ea-data-selection/api/find-by-search-filtred'
x = requests.post(url, json = DataJson)
response = pd.json_normalize(x.json())
Note that to query only one float, we can also use And we can of course mix multiple fields in one request. With that request we can retrieve :
So the profile timestamp is obviously missing for us. One solution (not a good one) is to loop over another request by station_id ( Another note, about pagination. For now, I manage to retrieve data with Just as an example, here's an example of json request post to the API : {
"criteriaList": [
{
"field": "startDate",
"values": [
{
"name": "2020-01-01T00:00:00.000+0100",
"code": "2020-01-01T00:00:00.000+0100",
"n": 0
},
{
"name": "2020-08-25T00:00:00.000+0200",
"code": "2020-08-25T00:00:00.000+0200",
"n": 0
}
],
"types": [
"DATE"
]
},
{
"field": "globalGeoShapeField",
"values": [
{
"code": "{\"type\":\"POLYGON\",\"coordinates\":[[[-180,-90.0],[-180,90.0],[180,90.0],[180,-90.0],[-180,-90.0]]]}\"",
"name": "",
"n": 0
}
],
"types": [
"GEOGRAPHIC"
],
"options": []
},
{
"field": "deploymentYear",
"values": [],
"types": [
"AUTOCOMPLETE",
"FACET"
],
"options": [
"SORTED_VALTXT_DESC"
],
"sortPriority": 0,
"order": "DESC"
}
],
"pagination": {
"page": 1,
"size": 10000,
"isPaginated": false
},
"bboxParams": {
"latTopLeft": 90.0,
"lonTopLeft": -180,
"latBottomRight": -90.0,
"lonBottomRight": 180,
"zoom": 5
},
"languageEnum": "en"
} |
I see here a first difficulty for argopy, since our internal file system based on fsspec does not support POST requests for the This means that we would need to build something on top of it to support POST requests. When working on #28 I encountered https://github.com/ross/requests-futures, this may be a solution. |
1st bench on coriolis-datacharts.ifremer.frUsing #28 new argopy parallel fetching possibilities, I tried to bench one float P/T/S data fetching: First we need to load the list of all URLs to fetch (910 in total, one URL per parameter per station), see this file urls_6902749.txt: with open('urls_6902749.txt') as of:
d = of.readlines()
urls = []
for l in d:
url = l.replace("[","").replace("]","").split("\n")[0].strip().replace("'","")
urls.append(url)
print("Eg:", urls[0])
print("N=", len(urls)) Eg: https://coriolis-datacharts.ifremer.fr/api/profiles?platform=6902746&start=1499352540&end=1499352540&parameter=66&measuretype=1
N= 910 Then fetch them: # Create argopy http store:
import time
from argopy.stores import httpstore
fs = httpstore(cache=False)
# Perform several fetches to bench performances:
t = []
for r in range(0, 5):
try:
print("run %i" % r)
start_time = time.time()
d = fs.open_mfjson(urls, max_workers = 100, progress=1);
t.append(time.time()-start_time)
except:
pass this leads to fetching times of:
i.e. about 6 seconds for 1 float core data. Let's compare to existing data sources: reg = {}
for src in ['erddap', 'argovis']:
start_time = time.time()
ArgoDataFetcher(src=src, cache=False).float(6902749).to_xarray()
reg[src] = time.time()-start_time
print(reg) {'erddap': 1.236616849899292, 'argovis': 2.6754958629608154} So the Coriolis API is much longer than existing API, and that's not even with accounting for post-processing on the client side. |
From Euro-Argo:
The API we're talking about here (coriolis-datacharts.ifremer.fr), is the one powering this new web interface for data visualization after selection with dataselection.euro-argo.eu we can provide feedback here This new data selection API documentation can be found here: https://dataselection.euro-argo.eu/swagger-ui.html |
Example of usage of the new data selection API: Retrieve the list of profile coordinates: https://dataselection.euro-argo.eu/api/find-by-search-filtred curl -X POST
--header 'Content-Type: application/json'
--header 'Accept: application/json'
-d '{"criteriaList":[{"field":"startDate","values":[{"name":"2020-10-30T16:58:42.487+0100","code":"2020-10-30T16:58:42.487+0100","n":0}],"types":["DATE"]},{"field":"globalGeoShapeField","values":[{"code":"{\"type\":\"POLYGON\",\"coordinates\":[[[-66.26953125000001,31.353636941500987],[-66.26953125000001,37.30027528134433],[-60.46875000000001,37.30027528134433],[-60.46875000000001,31.353636941500987],[-66.26953125000001,31.353636941500987]]]}\"","name":"","n":0}],"types":["GEOGRAPHIC"],"options":[]},{"field":"cycleQcState","values":[{"name":"Good","code":"Good","n":0}],"types":["FACET"],"options":["SORTED_VALTXT_ASC"]}],"pagination":{"page":1,"size":9000,"isPaginated":false},"bboxParams":{"latBottomRight":-90,"latTopLeft":90,"lonBottomRight":180,"lonTopLeft":-180,"zoom":2},"languageEnum":"en"}' 'https://dataselection.euro-argo.eu/api/find-by-search-filtred' Output a list of points like: {
"id": 4890265,
"cvNumber": 94,
"coordinate": {
"lat": 33.825,
"lon": -64.758,
"geohash": "dw1bqmssvjp2",
"fragment": true
},
"platformCode": "3901654",
"cycleQcState": "Good",
"level": 0
}, in the response above the WMO is in "platformCode" and the cycle number is in "cvNumber" using the profile "id", we can then visualise profile data at: https://dataselection.euro-argo.eu/cycle/4890265 |
The data selection tool (https://dataselection.euro-argo.eu/), could be used by argopy to create a dashboard for a given fetcher |
Note that: is the local dev version of this one: |
Retrieve all data from a single floatset time stamps to very old and very far in future values: This will retrieve Temperature (code=35) data from 1900/01/01 to 2100/12/31 of float WMO 3901654 ps: to get parameter code information: https://co-discovery-demo.ifremer.fr/coriolis/api/params/parametre?code=35 |
The data selection API is being developed here: This is an Ifremer private repo, just mentioned here for the record |
Stale issue message |
The above API is no longer valid, it's now located at: https://api-coriolis.ifremer.fr/legacy So that, now we're in requests like this: or knowing the profile ID: and it's documented here: |
This issue was marked as staled automatically because it has not seen any activity in 90 days |
This issue was closed automatically because it has not seen any activity in 365 days |
This issue was marked as staled automatically because it has not seen any activity in 90 days |
This fetcher will be based on Cassandra/Elastic Search API(s) developed for Coriolis database at Ifremer IT department.
Those API(s) are mostly meant for Web portals but we plan to integrate a new fetcher based on this.
To start, I create this issue to list our feedback & ideas on various aspects :
The text was updated successfully, but these errors were encountered: