Skip to content
This repository was archived by the owner on Mar 20, 2023. It is now read-only.

Bulk Helpers #5

Closed
owaaa opened this issue Sep 26, 2016 · 13 comments
Closed

Bulk Helpers #5

owaaa opened this issue Sep 26, 2016 · 13 comments

Comments

@owaaa
Copy link

owaaa commented Sep 26, 2016

Is there a way to call an async version of the bulk helpers? I couldn't find away to do this when looking around.

@honzakral
Copy link
Contributor

Unfortunately there is currently no way to do this. To make this work we'd have to reimplement the _process_bulk_chunk helper and all the others that use it. It shouldn't be too expensive time-wise.

@stickperson
Copy link
Contributor

@honzakral I took a look at implementing helpers.bulk the other day. _process_bulk_chunk and streaming_bulk are generators which might be tricky to implement. Asynchronous iterators were introduced in python 3.5, but seeing as this library uses python 3.4 that might not be possible.

Thoughts?

@Archelyst
Copy link

Having async helpers would be really helpful. It's really not that hard. However, the final piece for this only came with Python 3.6: async generators. Those allow for almost identical code with the synchronous version. And I'd argue that people who use async are probably willing to use Python 3.6.

@stickperson
Copy link
Contributor

Yup. Here's a link to the PEP: https://www.python.org/dev/peps/pep-0525/

Still waiting to hear what @honzakral thinks.

@honzakral
Copy link
Contributor

I would love to have async bulk helpers and it makes a lot of sense. If at all possible I would prefer the solution to be at least 3.5 compatible, not relying on 3.6.

If anybody wants to take a stab at it I would be happy to help with reviews and feedback (also if any changes in elasticsearch-py would make this easier we can do so). If no one is interested I will definitely try it but I am not sure of the time frame..

@eranhirs
Copy link

I would like to add that an async version of the Scan helper is also necessary, if someone is already taking a stab at the bulk helpers.

@eranhirs
Copy link

eranhirs commented Dec 7, 2017

Regarding this issue, what do you think about this answer?

Elasticsearch's bulk inserts are asynchronous. You can use the Elasticsearch.bulk python API or the slightly more convenient elasticsearch.helpers.bulk API for this.

@honzakral
Copy link
Contributor

@eranhirs nothing about elasticsearch's bulk API is asynchronous unfortunately.

@mazzma12
Copy link

Any update on this one? It has been a year and python3.6 adoption has become larger

0bsearch added a commit to 0bsearch/elasticsearch-py-async that referenced this issue Aug 14, 2018
@amitripshtos
Copy link

For a quick solution, I created my own bulk helper:
https://gist.github.com/amitripshtos/efd280e88376623b491c8682f417d597

However, if you guys think we can use the python3.6 pep with async generators, I can create a proper PR with that helper.

I think it's a critical helper we need in this package , and I'm more than happy to help :)

0bsearch added a commit to 0bsearch/elasticsearch-py-async that referenced this issue Nov 8, 2018
0bsearch added a commit to 0bsearch/elasticsearch-py-async that referenced this issue Nov 8, 2018
0bsearch added a commit to 0bsearch/elasticsearch-py-async that referenced this issue Nov 8, 2018
0bsearch added a commit to 0bsearch/elasticsearch-py-async that referenced this issue Nov 14, 2018
0bsearch added a commit to 0bsearch/elasticsearch-py-async that referenced this issue Nov 20, 2018
0bsearch added a commit to 0bsearch/elasticsearch-py-async that referenced this issue Nov 20, 2018
0bsearch added a commit to 0bsearch/elasticsearch-py-async that referenced this issue Nov 20, 2018
0bsearch added a commit to 0bsearch/elasticsearch-py-async that referenced this issue Nov 20, 2018
0bsearch added a commit to 0bsearch/elasticsearch-py-async that referenced this issue Nov 20, 2018
@bisoldi
Copy link

bisoldi commented Mar 21, 2019

I see some commits against a fork for this issue, has there been any progress in getting this into the codebase?

@mjzarrin
Copy link

Hi,
Any update?
In my tests, using elasticsearch-py-async for indexing bulk data like Pandas Dataframe, doesn't make sense as it brought more pressure on elasticsereach server which caused some Timeout Error.
Although, all part of my codes are optimized by concurrently, I have to use blocking es bulk helper.
for me asynchronous bulk helper is the most valuable feature. I would be so thankful if you take more attention on developing it.

@sethmlarson
Copy link
Contributor

There are asynchronous bulk helpers available in elasticsearch[async] 7.8.0+, see #81 for more info.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants