Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize step data transmission #141

Open
Tracked by #105
ezio-melotti opened this issue Sep 17, 2021 · 5 comments
Open
Tracked by #105

Optimize step data transmission #141

ezio-melotti opened this issue Sep 17, 2021 · 5 comments
Labels
refactoring Improve the code without altering behavior
Milestone

Comments

@ezio-melotti
Copy link
Collaborator

ezio-melotti commented Sep 17, 2021

Currently for each step that we send, we repeat a lot of data. We could remove this duplication by sending an object once at the beginning that contains repeated information, such as:

  • the units for each currency
  • the order of the currencies/values in a group
  • the "nice" names for the currencies/agents

For example, the "total_production" group in each step data currently looks like:

  "total_production": {
    "atmo_co2": {
      "value": 0.025916,
      "unit": "1.0 kg"
    },
    "atmo_o2": {
      "value": 0,
      "unit": ""
    },
    "h2o_potb": {
      "value": 4.75,
      "unit": "1.0 kg"
    },
    "enrg_kwh": {
      "value": 0.000474,
      "unit": "1.0 kWh"
    }
  },

For each currency there is a corresponding object with a "value" and a "unit". We could include in the initial schema an object that maps currencies and units:

"units": {
    "atmo_co2": "kg",
    "atmo_o2": "kg",
    "h2o_potb": "kg",
    "enrg_kwh": "kWh",
    "...": "..."
}

And each resulting simplified step data will look like:

  "total_production": {
    "atmo_co2": 0.025916,
    "atmo_o2": 0,
    "h2o_potb": 4.75,
    "enrg_kwh": 0.000474
  },

By doing this we lose some flexibility, since e.g. each CO2 amount will be expressed in kilograms, even if it's a fraction of a gram. However, the frontend can take care of converting to the most appropriate unit (e.g. from kg to g or mg).


We could also include the "nice" names in the initial object, and use them e.g. in the panels:

"names": {
    "atmo_co2": "Carbon Dioxide",
    "atmo_o2": "Oxygen",
    "h2o_potb": "Water",
    "enrg_kwh": "Energy",
    "...": "..."
}

If we want to optimize further, we can factor out some of the other keys. For example, we could include a schema in the initial object that specifies the order of the values for each group, e.g.:

{"total_production": ["atmo_co2", "atmo_o2", "h2o_potb", "enrg_kwh"]}

And then each step data will only include the following, without repeating the name of the currency or the unit:

{"total_production": [0.025916, 0, 4.75, 0.000474]}

Doing this might increase the complexity of the frontend code though, and might introduce bugs since the role of each value needs to be determined by looking at the initial schema.


Another possible optimization, is combining multiple step data. The backend already sends step data in batches, so a batch of 5 steps could be compressed into something like:

  "total_production": {
    "atmo_co2": [0.025916, ..., ..., ..., ...],
    "atmo_o2": [0, ..., ..., ..., ...],
    "h2o_potb": [4.75, ..., ..., ..., ...],
    "enrg_kwh": [0.000474, ..., ..., ..., ...]
  },

This will require some extra work on both the backend (since it will have to combine the step data), and the frontend (that will have to extract them).


Regardless of the actual structure of the json, we could also look into adding compression at the network level, e.g. by gzipping the data.


For reference, this is what a single step data looks like:

Click to show step ```json { "id": 8396705445451241000, "step_num": 1, "user_id": 5, "game_id": 2026786606093101000, "start_time": 1631907999, "time": 3600, "hours_per_step": 1, "is_terminated": "False", "termination_reason": null, "agent_growth": { "radish": 0 }, "total_agent_count": { "human_agent": 1 }, "total_production": { "atmo_co2": { "value": 0.025916, "unit": "1.0 kg" }, "atmo_o2": { "value": 0, "unit": "" }, "h2o_potb": { "value": 4.75, "unit": "1.0 kg" }, "enrg_kwh": { "value": 0.000474, "unit": "1.0 kWh" } }, "total_consumption": { "atmo_co2": { "value": 0, "unit": "" }, "atmo_o2": { "value": 0.021583, "unit": "1.0 kg" }, "h2o_potb": { "value": 0.165833, "unit": "1.0 kg" }, "enrg_kwh": { "value": 3.723, "unit": "1.0 kWh" } }, "details_per_agent": { "in": { "enrg_kwh": { "solid_waste_aerobic_bioreactor": { "value": 0, "unit": "" }, "multifiltration_purifier_post_treatment": { "value": 0.012, "unit": "1.0 kWh" }, "oxygen_generation_SFWE": { "value": 0, "unit": "" }, "urine_recycling_processor_VCD": { "value": 0, "unit": "" }, "co2_removal_SAWD": { "value": 0, "unit": "" }, "co2_reduction_sabatier": { "value": 0, "unit": "" }, "ch4_removal_agent": { "value": 0, "unit": "" }, "dehumidifier": { "value": 0, "unit": "" }, "crew_habitat_small": { "value": 2.711, "unit": "1.0 kWh" }, "greenhouse_small": { "value": 1, "unit": "1.0 kWh" }, "radish": { "value": 0, "unit": "" } }, "atmo_co2": { "co2_removal_SAWD": { "value": 0, "unit": "" }, "co2_reduction_sabatier": { "value": 0, "unit": "" }, "radish": { "value": 0, "unit": "" } } } }, "storage_capacities": { "air_storage": { "1": { "atmo_o2": { "value": 390.097667, "unit": "kg" }, "atmo_co2": { "value": 0.795725, "unit": "kg" }, "atmo_n2": { "value": 1454.3145, "unit": "kg" }, "atmo_ch4": { "value": 0.003483, "unit": "kg" }, "atmo_h2": { "value": 0.001024, "unit": "kg" }, "atmo_h2o": { "value": 18.704167, "unit": "kg" } } }, "water_storage": { "1": { "h2o_potb": { "value": 1345.584167, "unit": "kg" }, "h2o_urin": { "value": 0.0625, "unit": "kg" }, "h2o_wste": { "value": 0.087083, "unit": "kg" }, "h2o_tret": { "value": 144.25, "unit": "kg" } } }, "nutrient_storage": { "1": { "biomass_totl": { "value": 0, "unit": "kg" }, "sold_n": { "value": 100, "unit": "kg" }, "sold_p": { "value": 100, "unit": "kg" }, "sold_k": { "value": 100, "unit": "kg" }, "sold_wste": { "value": 0, "unit": "kg" } } }, "power_storage": { "1": { "enrg_kwh": { "value": 996.277, "unit": "kWh" } } }, "food_storage": { "1": { "food_edbl": { "value": 99.937083, "unit": "kg" } } } } } ```

@ezio-melotti ezio-melotti added the refactoring Improve the code without altering behavior label Sep 17, 2021
@ezio-melotti ezio-melotti added this to the Phase V milestone Sep 17, 2021
@ezio-melotti ezio-melotti mentioned this issue Sep 17, 2021
31 tasks
@granawkins
Copy link
Collaborator

Update

In the course of the work for ABM-Redesign, Grant added the AgentDataCollector class, which scrapes all potentially relevant data from an agent each step. It was initially developed for testing, and then became useful for the Jupyter workflow.

Now, as part of adding the ABM-Redesign functionality to the frontend, we will do a thorough update of the collection, storage and transmission of simdata.

The items in this issue (above) are directly relevant and the name works, so I'm co-opting this issue instead of creating a new one.

Plan

  1. Define a new schema for sharing data between frontend/backend
    • Optimize size via reorganizing and/or compression
    • Fetch specific steps/fields as-needed
  2. Get baseline speed/size figures for comparison
  3. Update AgentDataCollector to support new schema
  4. Update storage/transmission system (GameRunner/Redis)
  5. Update API (Flask/frontend) to new storage/transmission and schema

@ezio-melotti
Copy link
Collaborator Author

The currency_desc.json file could be used to solve the problem of the currency names/units. If these values are added in the file, it could be sent as-is to the frontend.

  1. Get baseline speed/size figures for comparison

At this stage I don't think we need benchmarks. There is clearly a lot of duplicate data being sent, and it will certainly go faster once we remove it. Sending gzipped data at the network level (i.e. just by specifying it in the http headers), might be useful and simple enough to implement, but I would spend too much time working on custom solutions.

@granawkins
Copy link
Collaborator

granawkins commented May 6, 2022

A simple solution would be to have the backend send the output of AgentModel.get_data(debug=True) to the front-end directly.

It includes all fields for all agents/currencies at all steps.

The 4-human-garden full-simulation object is about 1.1MB, compared to the current at ~8MB and includes a small subset of data.

I think we can also really simplify the front-end by storing this and indexing it directly from the panels.

@granawkins granawkins reopened this May 7, 2022
@ezio-melotti
Copy link
Collaborator Author

Regardless of the actual structure of the json, we could also look into adding compression at the network level, e.g. by gzipping the data.

Good news everyone! Looks like we already have this:
image
The highlighted request contains 9 days (216 steps) of data sent through websocket for the 1 human preset, and it was compressed down from 422k to 17k during the transfer. The arrows show that the frontend was already accepting gzip and the backend also encoded the data as gzip.

Exporting the data and gzipping them yields similar values (the format is a bit different):

 379614 simoc-simulation-data-1h-9d.json
  19833 simoc-simulation-data-1h-9d.tar.gz

@granawkins
Copy link
Collaborator

Great! Ya looks like socketio uses compression by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Improve the code without altering behavior
Projects
None yet
Development

No branches or pull requests

2 participants