Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to g32influx script #107

Closed
wants to merge 23 commits into from
Closed

Conversation

ahincks
Copy link

@ahincks ahincks commented Feb 17, 2020

As I work on assessing the scalability of influxdb as an "official" storage format, I've made some modifications to this script.

Speed is greatly improved (on my laptop, ~5 seconds per file instead of ~1 minute):

  • By using line format rather than JSON for writing to influxdb DB.
  • By writing whole timelines at once time rather than field-by-field.

Rather than using a standalone sqlite DB to record which files have been written, there is now a measurement in the influxdb DB, .g32influx.log, that records when files have been completed (and also records if they have been started, as well as if an error has occurred. This can be queried to determine whether a file needs to be written or not (which can be overridden with -f).

In addition to the feed tag, an md5sum tag is added to each HK point so that the provenance of data can be traced.

  • One thing to test for is whether this increases the series cardinality too much. Based on this guideline I think it's going to be OK, but will want to assess this.

Although I'm creating this pull request here so that you can see the changes, it could reflect a simple branch of this code away from ocs as I continue my own investigation. So, if you have concerns about anything I've done here and don't think this belongs in ocs, no problem.

@ahincks ahincks changed the title G32influx script adh Updates to g32influx script Feb 17, 2020
@BrianJKoopman
Copy link
Member

Thanks for the PR, Adam! Getting that kind of speed improvement would be great. I'm looking forward to reviewing this, but probably won't be able to look much at it until early next week, after our SMuRF/DAQ meeting at Princeton.

BrianJKoopman added a commit that referenced this pull request Jul 17, 2020
This incorporates several changes from #107, but with some modifications. Most
notably it includes format_timeline(), which comes with a dramatic performance
increase, both by publishing all fields that share a timeline simultaneously and
by utilizing InfluxDB's line protocol, which is faster than the previously used
json based protocol.

We maintain the use of a local sqlite DB instead of switching to using InfluxDB
to track published .g3 files. We also remove the use of an additional md5sum
tag on data uploaded with this script.

Many thanks to @ahincks for the work this was based on.
@BrianJKoopman
Copy link
Member

After talking to @ahincks and incorporating some of the changes here into #153 we agreed to close this PR.

BrianJKoopman added a commit that referenced this pull request Jul 17, 2020
This incorporates several changes from #107, but with some modifications. Most
notably it includes format_timeline(), which comes with a dramatic performance
increase, both by publishing all fields that share a timeline simultaneously and
by utilizing InfluxDB's line protocol, which is faster than the previously used
json based protocol.

We maintain the use of a local sqlite DB instead of switching to using InfluxDB
to track published .g3 files. We also remove the use of an additional md5sum
tag on data uploaded with this script.

Many thanks to @ahincks for the work this was based on.
BrianJKoopman added a commit that referenced this pull request Jul 17, 2020
* Start work on g32influx script

* Handle new filename format

* Incorporate select changes from #107

This incorporates several changes from #107, but with some modifications. Most
notably it includes format_timeline(), which comes with a dramatic performance
increase, both by publishing all fields that share a timeline simultaneously and
by utilizing InfluxDB's line protocol, which is faster than the previously used
json based protocol.

We maintain the use of a local sqlite DB instead of switching to using InfluxDB
to track published .g3 files. We also remove the use of an additional md5sum
tag on data uploaded with this script.

Many thanks to @ahincks for the work this was based on.

* gitignore: Ignore any local databases

* docs: Add documentation for g32influx
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants