Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce all json files size #366

Closed
AugustinMortier opened this issue Mar 29, 2021 · 7 comments · Fixed by #499
Closed

Reduce all json files size #366

AugustinMortier opened this issue Mar 29, 2021 · 7 comments · Fixed by #499
Assignees
Labels
api Planned changes in API (not backwards compatible) enhancement ✨ New feature or request
Milestone

Comments

@AugustinMortier
Copy link
Member

Te json files currenthly have 17 decimal numbers. By reducing it down to 4-5?, we could save a lot of space. This can be applied to all of the json files: time series, glob_stats, map, and scat. Only the contour files already have a limited decimal number.

e.g

"mnmb": 0.10131313755341606

to be changed to

"mnmb": 0.1013
@AugustinMortier AugustinMortier added enhancement ✨ New feature or request api Planned changes in API (not backwards compatible) labels Mar 29, 2021
@jgliss
Copy link
Contributor

jgliss commented Mar 30, 2021

Good point @AugustinMortier. We should do that together with the upcoming AeroVal updates for v0.12.0.

@jgliss jgliss added this to the v0.12.0 milestone Mar 30, 2021
@jgliss
Copy link
Contributor

jgliss commented Oct 11, 2021

Would be good to look into this soon, perhaps @dulte, if you have time. Will not go into v0.12.0 though.

@jgliss jgliss modified the milestones: v0.12.0, v0.13.0 Oct 11, 2021
@jgriesfeller
Copy link
Member

I tumbled over this and looked a bit around:

It seems that all json writing is based on the method

def write_json(data_dict, file_path, **kwargs):

which just calls simplejson.dump

According to my findings one has to round the floats to a given precision to get these into the json file.

like (from above):

def round_floats(o):
    if isinstance(o, float):
        return round(o, 5)
    if isinstance(o, dict):
        return {k: round_floats(v) for k, v in o.items()}
    if isinstance(o, (list, tuple)):
        return [round_floats(x) for x in o]
    return o

data = json.dumps(round_floats(data))

@avaldebe
Copy link
Collaborator

avaldebe commented Oct 19, 2021

I would lean into np.round and resolve the dict recurrence on the list clause

import numpy as np
import simplejson as json

def round_floats(o, *, decimals: int = 5):
    if isinstance(o, float):
        return np.round(o, decimals)
    if isinstance(o, (list, tuple)):
        return np.array(o).round(decimals).tolist()
    if isinstance(o, dict):
        return dict(zip(o, round_floats(o.values(), decimals=decimals))
    return o

data = json.dumps(round_floats(data))

And maybe subclass the simplejson.JSONEncoder to handle the float encoding there instead calling json.dumps(round_floats(data))

@avaldebe
Copy link
Collaborator

    if isinstance(o, (list, tuple)):
        return np.array(o).round(decimals).tolist()

I guess that this would only work for a list of floats...

@jgriesfeller
Copy link
Member

The code was just a snippet I found and not meant to be used as it is. My real code will be in the linked PR

@jgriesfeller
Copy link
Member

just another comment on this since I am testing the PR.
The precision on the geojson files is already limited to 5 digits after the decimal point, so this PR will not help with getting the geojson files smaller. But it does help with the normal json files.
Unfortunately the Aeronet processing changed on version 0.12.1 so it's not straight forward to compare different results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Planned changes in API (not backwards compatible) enhancement ✨ New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants