-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic SDK code generation #40
Comments
Thanks for bringing this up. In the past weeks I have spend a lot of time implementing the
Using JSON for time-series data is rather inefficient. I implemented an optimised JSON serialiser on server side to quickly encode data to JSON, but most client side JSON implementations do not particular work well with large floating point arrays. Even for a moderate amount of weather data, parsing data can take 20-100 milliseconds or more. Using binary serialisation formats like FlatBuffers or Protobuf can solve this issue. Especially with FlatBuffers, floating point arrays can be transferred directly. Because FlatBuffers uses fixed types, this also makes working strict typing easier on the client. Right now, you can find the FlatBuffers definitions here: https://github.com/open-meteo/swift-sdk/tree/main/FlatBuffers. I am still actively developing, so they are likely to change. The basic idea is to provide client libraries that offer a simple interface to decode data. For example in Python it may look like this: om = HttpxClient()
params = {
"latitude": [52.54, 48.1, 48.4],
"longitude": [13.41, 9.31, 8.5],
"hourly": ["temperature_2m", "precipitation"],
"start_date": "2023-08-01",
"end_date": "2023-08-02",
# 'timezone': 'auto',
# 'current': ['temperature_2m','precipitation'],
"format": "flatbuffers",
}
results = om.weather_api("https://archive-api.open-meteo.com/v1/archive", params=params)
assert len(results) == 3
res = results[0]
assert res.Latitude() == pytest.approx(52.5)
assert res.Longitude() == pytest.approx(13.4)
res = results[1]
assert res.Latitude() == pytest.approx(48.1)
assert res.Longitude() == pytest.approx(9.3)
print("Coordinates ", res.Latitude(), res.Longitude(), res.Elevation())
print(res.Timezone(), res.TimezoneAbbreviation())
print("Generation time", res.GenerationtimeMs())
print(res.Hourly().Temperature2m().ValuesAsNumpy()) All attributes like The principles for each client in individual programming languages are:
My rough plan for the FlatBuffers format is
Wdyt? Would like to have a look at the FlatBuffers format and try to compile it for Kotlin? |
In addition to the historical long arrays, now that more and more parameters are being added to the APIs the data transfer can get quite heavy.
I've looked a bit at the FlatBuffer docs and it seems quite interesting. I'm using ProtoBuf already in the Kotlin SDK, and it works flawlessly (apart from some issues I had with the elements order initially).
I think having IDE completion is a must for the SDKs. So far I had implemented a bash script that could create the Hourly/Daily/Models "options" (that's how I called them in the SDK code at least, more on this at the end).
In the Kotlin SDK I have an
But in the
As you can see, the user has always a way to set the endpoint context for every call, no matter what. With this said, maybe some sort of standardization across the SDKs could be helpful.
IDE implementation could suffer when using just strings... So far I just made a simple object with key-value pairs:
The
It's like "forcing" to use arrays for the response data, even if one city is returned, right? Okay 👍
What do you mean by this? Like, giving the dev the option to pick which client they want to use? Right now I'm using the built-in HTTP client and it just works (+ it doesn't increase the bundle size, since any library would use the built-in client anyway).
Should this be done by default? Like 5 retries? Still, it should be configurable, in case someone doesn't want to hog the server...
I think this should be part of the actual client, rather than the SDK package. Let's say, hypothetically, someone is using the Kotlin SDK in a weather app. It's the app (client) that should cache the data for offline usage, not the SDK.
👍 to both points
I'm pretty sure Kotlin should transcompile to javascript/typescript, so don't rush that SDK.
Should we just rename this issue?
This looks a bit tough to do automatically... I have no idea how to do it, but it can probably be an SDK-related script, perhaps. Otherwise the coding structure should be super strict.
Ok, I start experimenting with FlatBuffers and report back as soon as I get some stuff working and get an actual idea of what is like.
|
I just had an idea that could ease the SDK coordination: why not creating a (template) repo with some "common" docs/scripts that can just be forked for each SDK and customized per language? Having a common |
I tried "compiling" just the
The byte size/SLOC is not a perfect code size measure, I know, but as a reference: the whole (not just The big advantage of course is that we could drop any JSON/ProtoBuf library on the client side and reduce the server/client load, so probably the final package/bundle/library size maybe could still be less: when I find a proper way to bundle the kotlin sdk into a single package with the deserialization library, I'll post some more scientific results. Footnotes
|
I am worries that a fixed list of weather variables will be long and limiting. Especially with data on pressure level like Using fixed types for the result-set clearly improves codes and makes it safer to use like
In programming languages like Python, you need to use a HTTP client library to fetch data. There are different clients for different use-cases. If there is a built-in client, sure, thats already sufficient.
Correct. The user will also get multiple responses if multiple weather models are used.
Yes, this should be done by default, but configurable by the user. In many cases, a simple network error could interrupt and application. Many developers are also unaware that the HTTP transport can be unreliable.
Yes, this is generated per SDK. There could be a simple switch to select the programming language than it then shows simple instructions on how to use it, given the current selected weather variables. For Python it would just be # Run `pip install xxxxxxx`
cache = HttpxCache(path=".cache", ttl=86400)
om = HttpxClient(cache=cache)
params = {
"latitude": [52.54],
"longitude": [13.41],
"hourly": ["temperature_2m", "precipitation"],
"start_date": "2023-08-01",
"end_date": "2023-08-02"
}
results = om.weather_api("https://archive-api.open-meteo.com/v1/archive", params=params)
result = results[0]
print("Coordinates ", result.Latitude(), result.Longitude(), resulr.Elevation())
print(res.Timezone(), res.TimezoneAbbreviation())
hourly = result.Hourly()
time = hourly.something_that_generates_a_time_iterator()
temperature2m = hourly.Temperature2m().ValuesAsNumpy()
precipitation = hourly.Precipitation().ValuesAsNumpy() This is simply a jump-start for users to quickly get data into their program. Depending on the programming language, it support different use-cases. For Python, it is geared towards data-science and therefore I want to encourage the use of a cache. For a web-application in Typescript, a cache like this does not make sense.
I am considering a mono repo which contains all compiled schemas for all programming languages. The big advantage is that any addition to the schema files, will generate the code for all programming languages in one go. The drawback: It will be a pain to setup all package manager integrations....
Yes, this is a trade-off, but the code size is still reasonable IMHO. The compiled size is of course significantly smaller and many code paths could even be removed entirely by the compiler/linker. Right now, I do not see any other way to reduce the code size signifncantly |
Update: There is now a mono repository for the compiled FlatBuffers schema with package manager integration for Python, TypeScript and Swift. Using a mono repository keeps it simple to update schema files and consistently distribute all files. I mostly documented the structure. I still have some remaining doubts and I do not like that certain weather variables have a lot of duplicates (e.g. temperature or soil properties on different levels), but it is consistent and it works. Other approaches (like using enums for all variables) have drawbacks on client and/or server side. The first Python API client is also relatively far. It is based on the Python Other programming languages or package manager follow later. Code generation in the API documentation is mostly done (#42). All selected parameter are applied automatically and the generated dummy code should work as a good starting point for any data scientist. The code also includes cache and retry. @DadiBit Wdyt? |
TL;DR: why not let the dev provide "temperature_2m" both in the request query and the response access? See FlexBuffers for unstructured data.
I shall re-consider my take on this: if all parameters are automatically generated, a bunch of useless code is pushed, making everything huge. So yeah, it just makes little sense, when the dev has to go to the docs website to check the available params anyway.
I think we could simply implement a The way I would implement it in Python1: class Hourly(object):
def __init__(self, values):
self._values = values # I have zero idea of how data is stored...
def __getitem__(self, key, altitude): # [A, B] is not doable in JavaScript, only [A][B]... Maybe a middle type could be a feasible standard across all languages
# return self._values[f"{key}_{altitude}"] # concatenate the two keys? Idk, what if no altitude is provided... Yep needs an if-else fork, but can access the String-Values dictionary directly, maybe
return self._values.filter({ $0.variable == key && $0.altitude == altitude }).asFloat()? # are you sure it's .temperature and not "temperature"/key?
p = Hourly(...)
print (p["temperature", 2])
FlexBuffers should work perfectly for this job, but they are slower. It'd be cool to use a FlexBuffer just to store unknown data/keys. The problem? How the hell does the server know which keys doesn't the client know? Doable, but a bit hard to implement. An easy way to kill two birds with one store is to arbitrarily2 pick the more popular parameters and provide them in the Footnotes
|
Yep, it's plain and simple: no webhooks, no manual/scheduled gh action
I still wonder if something like this could work: res.hourly("temperature_1000hPa")
res.hourly("temperature", hPa = 1000) # internally calls `.hourly( "temperature_1000hPa")`
res.hourly("temperature_2m")
res.hourly("temperature", m = 2) # internally calls `.hourly( "temperature_2m")`
res.hourly("is_day")
So, if I can just "bake" a mini HTTP client in the library, is it compliant anyway? It was quite simple to implement in Kotlin, since I only used it for GET requests. I still need to implement the "retry" logic.
Problem: JitPack (a Kotlin/Java package publishing platform) uses the root directory to manage to get the build configuration (maybe a subdirectory can be used, I'm not 100% sure).
Neat! I believe that near "Python" the other languages will appear when ready, right? Love it! |
FlexBuffers does not work well. It is not supported for all programming languages and large floating point arrays are encoded differently. Ideally I want to be able to serve data 1:1 from my backend code. As FlexBuffers needs to be parsed as well and has no fixed data-types, there is not much benefit compared to other formats like BSON and others. I was considering a FlatBuffers scheme like you mentioned with
The schema is shorter, but it requires more logic in each programming language. E.g. helper functions like
Note: I do not want to use strings, but use enumerations. This works better with code completion and is slightly more efficient. Automatic code generation in the API documentation will be more complicated as I need to map I might spend a couple of hours and test this schema. It looks feasible on the first sight, but I am still undecided...
Yes, I want to integrate it into Maven. No clue how it works, but there is a Maven plugin for |
Update: I now merged all the required changes into the API code as well as updated the SDK. The schema is using the proposed array format instead of hard-coded attributes. Python, Swift and Typescript releases are fully automated and publish packages to the corresponding registries. Currently, I am working on the setup for Java. The process to get access to maven central and gradle portal is quite painful. @DadiBit do you know if it is sufficient to only publish Java packages and use those in Kotlin? I did not find an elegant way to publish a single distribution with Java and Kotlin. Alternatively, I can split them into |
To my understanding, Kotlin is translated to Java, and then run in the JVM; a bit like TypeScript for JavaScript. In other words, if you have a Kotlin library, you can use it in Java. If you have a Java library, you can use it in Kotlin. |
I know it's a bit late in the implementation of flatbuffers, but according to this benchamrk in Go and Rust protobufs seem to be (or at least have been 9 months ago) faster then flatbuffers... If you want to, I can do a benchmark in Kotlin with some sample data from the historical API (which is the "fattest" one) |
The benchmark shows that decoding is significantly fast, because it does not need to parse data :). The advantage gets even big for large floating point arrays. This works great on client side. On server side, encoding speed of protobuf and flatbuffers are similar. However, because the wire format for floating point arrays is just binary floating point data, I will also be able to implement a customised writer to send data without encoding it again. Right now, I am using the integrated FlatBuffers writer, but once the format is well established, I will develop a customised faster version. The SDK is now on Maven Central. Instructions are available here: https://github.com/open-meteo/sdk/tree/main/java. I do not have any Java examples yet Edit: All API servers do support the new FlatBuffers structure as of today! |
Forgot to mention: The Python code generation is now integration into the API documentation and can be tested here: http://staging.open-meteo.com/en/docs (API Response -> switch to Preview Python) |
I've been working a bit on a test fork for the GitHub action yml file, here you can see a successful1 run with an implementation of the commands you wrote in
I would like to implement a step to push the changes, but I have no idea how I should do it: do I create a branch and then a PR? Should/Could I push directly to main? Footnotes
|
Sidenote: I've split the flatbuffers tables/enums, and the sed command works fine with multiple fbs files as well. My idea was to just run flatc on the updated files to reduce the action runtime (possibly once with all language flags set), but since it requires having all included files as well this could work out only for the enums, which are not a lot: the idea could just be dropped, to be fair. |
Hi, What kind of changes do you want to do exactly? I want to keep the I would also prefer to keep it in a single file. There will be additional FlatBuffers schema files for geocoding and elevation APIs. Keeping each "kind" of API in a file keeps better separated. |
Ok, thank you.
Ooops, my issue originated when I wanted to integrate the code generation automatically through the gh action, that's why I was thinking to use them for this job (no pun intended).
👍 |
A college provided some Java example code using the maven central package: https://github.com/open-meteo/sdk/blob/main/java/README.md I also tested the Typescript integration with Svelte here: https://github.com/open-meteo/open-meteo-website/blob/main/src/routes/en/weather/%2Bpage.svelte https://github.com/open-meteo/typescript The Python instructions also got updated yesterday evening with some structure changes: https://github.com/open-meteo/python-requests I also added an example how an API response can be decoded using |
Thank you for all the resources. I got the generated code implementation working in Kotlin, hurray! Plus, there's even a basic streaming feature: it decodes one location entry at a time. 🙇♂️ If you're interested, here's the snippet of code: val inputStream = get(url) // get is the internal built-in HTTPS client
// TODO: here there should be a loop until the end of the response array
val lengthBytes = inputStream.readNBytes(4)
lengthBytes.reverse() // still need to figure out how endianess is handled by `.getInt` down there
val buffer: ByteBuffer = ByteBuffer.allocate(Integer.BYTES)
buffer.put(lengthBytes)
buffer.rewind()
val length = buffer.getInt()
val bytes = inputStream.readNBytes(length)
val apiResponse = ApiResponse.asRoot( ArrayReadWriteBuffer(bytes) )
// enjoy apiResponse.location & apiResponse.longitude Porting these few lines of code to Java should be easy, but I'm pretty sure it's better to stick to either Java or Kotlin, not both. I think Kotlin is easier to read and maintain |
As requested in open-meteo/open-meteo-website#40
As requested in open-meteo/open-meteo-website#40
As requested in open-meteo/open-meteo-website#40
Well, that was a lie ;D If you're interested in the current status of the Kotlin (not yet multiplatform) SDK there's issue #12 on the kotlin sdk repo |
Let me know if it helps to publish a Kotlin SDK version on Maven Central or you need any help! |
Moved to open-meteo/open-meteo#580 |
It would be nice to create a GitHub action that can automatically export the
hourly
/daily
/minutely_15
options to the Python SDK (already in the TODO) and the Kotlin SDK. Of course, this could be extended to all parameters.Options
Previously in the Kotlin SDK I used to fetch the website and
grep
the data out. Unluckily this can no longer work, since some options won't be loaded until the tab is selected (namely the UV index, pressure & co. ones in the forecast API).The main idea would be to either work directly with the API source code (which may lead to the generation of undocumented stuff, like the seasonal forecast API) or with the docs page (much better IMO) and trigger a
workflow_dispatch
action on the SDK projects.Any help on the parsing of the website docs source code would be appreciated.
The text was updated successfully, but these errors were encountered: