Skip to content

The DataFrame serialisation is slower than in v1 #92

Closed
@benb92

Description

@benb92

Using python pandas. Version 1 i used this:

def dbpop_influx(data, dbname, measurement, columns):
    ## constants:
    dbclient = DataFrameClient(host='localhost', port=8086, username='root', password='root', database=dbname)
    n_import_chunks = math.ceil(len(data) / 10000)
    data_chunks = np.array_split(data, n_import_chunks)
    for d in data_chunks:
        dbclient.write_points(d, measurement, tag_columns = columns, protocol = 'line')

Takes 29 seconds (was looking to improve that speed with multiprocessing)

Version 2 i used this:

_client = InfluxDBClient(url="http://localhost:9999", token=token, org="org")
_write_client = _client.write_api(write_options=WriteOptions(batch_size=10000,
                                                             flush_interval=10_000,
                                                             jitter_interval=0,
                                                             retry_interval=5_000))


start = time.time()
_write_client.write('data', record=imp_dat[0], data_frame_measurement_name='coinmarketcap_ohlcv',
                    data_frame_tag_columns=['quote_asset','base_asset'])
print(time.time() - start)

this takes 118 seconds...

data looks like:
image

@bednar

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions