-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publish messages > 'batch_setting.max_size' in a separate batch. #4608
Comments
Thanks for filing. I'm currently fixing #4575, which seems related. |
@kir-titievsky The issue is that The very first iteration runs until it has created every thread and then fails: >>> import google.auth
>>> from google.cloud import pubsub_v1
>>> batch_settings = pubsub_v1.types.BatchSettings(
... max_bytes=0, # no batching
... max_latency=1.0, # One second
... )
>>> publisher = pubsub_v1.PublisherClient(batch_settings)
E1219 10:27:12.768400187 29067 ev_epollex_linux.cc:1482] Skipping epollex becuase GRPC_LINUX_EPOLL is not defined.
E1219 10:27:12.768413837 29067 ev_epoll1_linux.cc:1261] Skipping epoll1 becuase GRPC_LINUX_EPOLL is not defined.
E1219 10:27:12.768441213 29067 ev_epollsig_linux.cc:1761] Skipping epollsig becuase GRPC_LINUX_EPOLL is not defined.
>>> _, project = google.auth.default()
>>> topic_path = publisher.topic_path(project, 'saxby')
>>> data = b'Message number 0'
>>> publisher.publish(topic_path, data=data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "${SITE_PACKAGES}/google/cloud/pubsub_v1/publisher/client.py", line 201, in publish
batch = self.batch(topic, create=True)
File "${SITE_PACKAGES}/google/cloud/pubsub_v1/publisher/client.py", line 133, in batch
topic=topic,
File "${SITE_PACKAGES}/google/cloud/pubsub_v1/publisher/batch/thread.py", line 91, in __init__
self._thread.start()
File "${HOME}/.pyenv/versions/3.6.3/lib/python3.6/threading.py", line 846, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread If you use a custom >>> from google.cloud.pubsub_v1.publisher.batch import thread
>>>
>>> class CustomBatch(thread.Batch):
... COUNT = 0
... def __init__(self, *args, **kwargs):
... CustomBatch.COUNT += 1
... super(CustomBatch, self).__init__(*args, **kwargs)
...
>>>
>>> publisher = pubsub_v1.PublisherClient(batch_settings, batch_class=CustomBatch)
>>> publisher.publish(topic_path, data=data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "${SITE_PACKAGES}/google/cloud/pubsub_v1/publisher/client.py", line 201, in publish
batch = self.batch(topic, create=True)
File "${SITE_PACKAGES}/google/cloud/pubsub_v1/publisher/client.py", line 133, in batch
topic=topic,
File "<stdin>", line 5, in __init__
File "${SITE_PACKAGES}/google/cloud/pubsub_v1/publisher/batch/thread.py", line 91, in __init__
self._thread.start()
File "${HOME}/.pyenv/versions/3.6.3/lib/python3.6/threading.py", line 846, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
>>> CustomBatch.COUNT
11207 The code is stuck in an infinite loop. It seems we could add some validation to the |
I agree -- this should be an explicit error, since we know what the rules are. My expectation here, as a naive user, that 0 means 'no batching' rather than 'impossible batching.'
|
@dhermes please update when you get a chance. Thanks! |
@chemelnucfin please add this one to the priority list. Any single message where message byte size |
on it. |
@danoscarmike Is there a reason you want to raise a ValueError? As the code stands currently it ignores the message already. Should we use warnings instead? |
It ignores but causes an infinite loop, so we should raise.
…On Thu, Feb 8, 2018, 4:36 PM chemelnucfin ***@***.***> wrote:
@danoscarmike <https://github.com/danoscarmike> Is there a reason you
want to raise a ValueError? As the code stands currently it ignores the
message already. Should we use warnings instead?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4608 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAPUc0zTlxU8XW96f-F8b9a7BTVU_JGWks5tS5MfgaJpZM4RFs-F>
.
|
@jonparrott But is that the desired behavior or should we just fix the infinite loop situation when ignoring the message? |
We should never just drop messages on the floor. Based on the behavior of the client in other languages, we should treat it as an error. |
Since #4872 is merged, this is no longer release blocking. |
@jonparrott ISTM that merging #4872 should have closed this issue? |
No, we need to actually match Java and Go's strategy here and submit the oversized message in its own batch. |
@chemelnucfin I'm not sure why this is marked as |
Note to self: use work from PR #4870 to address this issue. |
@jonparrott ISTM that there are a couple of questions here:
|
@jonparrott 's comment on Feb 27 says that oversize messages should be published in their own batch. PM perspective here: we do care about order of publishing. Client library should not re-order messages if possible. |
@kir-titievsky But it looked like from java that the batch gets sent off immediately? |
Ah, I see what you mean now. I, as a user, would expect that the large message would cause the existing buffer of small messages to get flushed first. |
@kir-titievsky, @theacodes We currently test for an overflow of the message count after appending the message in
As an alternative to step 3, we could just remove the existing post-append overflow test (currently only for count), and let the monitor thread or the next call to |
Running a simple no-batching publisher with Pub/Sub fails when attempting to publish 100 messages. The code works with 10 messages. Code sample:
Errors:
OS: macOS Sierra (10.12.6)
There is probably some way to make this work with non-default file limits, but it seems like publishing a 100 messages should work on any sane machine.
The text was updated successfully, but these errors were encountered: