Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timestamp() to librdkafka messages #50

Merged
merged 3 commits into from
Nov 9, 2016

Conversation

qix
Copy link
Contributor

@qix qix commented Sep 30, 2016

tstype still needs to be exposed somehow, not sure of the best way to do that.

The whitespace in confluent_kafka.c is a mix of tabs/spaces as well as spaces in blank lines which my editor automatically removed.

@ghost
Copy link

ghost commented Sep 30, 2016

It looks like @qix hasn't signed our Contributor License Agreement, yet.

Appreciation of efforts,

clabot

@qix
Copy link
Contributor Author

qix commented Sep 30, 2016

CLA signed.

Copy link
Contributor

@ewencp ewencp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. One nit, and of course the timestamp type is still missing. @edenhill do you want to take a look?

@@ -326,6 +326,14 @@ static PyObject *Message_offset (Message *self, PyObject *ignore) {
}


static PyObject *Message_timestamp (Message *self, PyObject *ignore) {
if (self->timestamp > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be >=?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should depend on the tstype, not the value of the timestamp.
E.g., even a seemingly invalid timestamp with a proper tstype is valid (because it might mean something in the future)

Copy link
Contributor

@edenhill edenhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs tests:

  • add to tests/test_Consumer.py (these are broker-less unit tests so just check that the API works and returns tstype none)
  • add to integration tests in examples/integration_test.py (this should check that the returned timestamp actually makes sense (which requires a recent enough broker..), e.g. tstype is append and timestamp is reasonably within range to time.time()

@@ -362,6 +370,11 @@ static PyMethodDef Message_methods[] = {
" :rtype: int or None\n"
"\n"
},
{ "timestamp", (PyCFunction)Message_timestamp, METH_NOARGS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to return the timestamp type as well (none, append, create).
Im not sure whether it should be some class level constants or strings, probably the former. @ewencp?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the timestamp type as a class level constant, e.g.:
TIMESTAMP_NOT_AVAILABLE, TIMESTAMP_CREATE_TIME, TIMESTAMP_APPEND_TIME
Map it directly to the librdkafka counterparts.

@@ -495,6 +508,12 @@ PyObject *Message_new0 (const rd_kafka_message_t *rkm) {
self->partition = rkm->partition;
self->offset = rkm->offset;

rd_kafka_timestamp_type_t tstype;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put variable declarations at the beginning of the scope to be C89 safe

@@ -326,6 +326,14 @@ static PyObject *Message_offset (Message *self, PyObject *ignore) {
}


static PyObject *Message_timestamp (Message *self, PyObject *ignore) {
if (self->timestamp > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should depend on the tstype, not the value of the timestamp.
E.g., even a seemingly invalid timestamp with a proper tstype is valid (because it might mean something in the future)

@qix
Copy link
Contributor Author

qix commented Oct 26, 2016

Sorry about the delay, this was blocked on a librdkafka bug: confluentinc/librdkafka#858

  • I added a tstype() function but didn't add the class level constants. I feel like that is the correct approach, but didn't know what to name them and that can be done in a separate commit.
  • Message object wasn't available in test_Consumer so couldn't add a test there without large changes.
  • The integration "test" doesn't have any assertions and isn't written as a test. I added tstype / timestamp to the print statement and manually verified.

Copy link
Contributor

@edenhill edenhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, almost there.

Some remaining actions:

  • Change timestamp() to return a tstype, timestamp tuple.
  • Define tstype constants
  • Add unit-tests
  • Improve documentation

@@ -327,13 +327,18 @@ static PyObject *Message_offset (Message *self, PyObject *ignore) {


static PyObject *Message_timestamp (Message *self, PyObject *ignore) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to have a single timestamp() method that returns a tuple: tstype, timestamp

@@ -362,6 +370,11 @@ static PyMethodDef Message_methods[] = {
" :rtype: int or None\n"
"\n"
},
{ "timestamp", (PyCFunction)Message_timestamp, METH_NOARGS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the timestamp type as a class level constant, e.g.:
TIMESTAMP_NOT_AVAILABLE, TIMESTAMP_CREATE_TIME, TIMESTAMP_APPEND_TIME
Map it directly to the librdkafka counterparts.

@@ -362,6 +370,11 @@ static PyMethodDef Message_methods[] = {
" :rtype: int or None\n"
"\n"
},
{ "timestamp", (PyCFunction)Message_timestamp, METH_NOARGS,
" :returns: message timestamp or None if not available.\n"
" :rtype: int or None\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably return a local timestamp in Python format (e.g float), and the doc string should be updated to reflect what is being returned.
The timestamp() method should also return the tstype (tstype,timestamp) tuple

@qix
Copy link
Contributor Author

qix commented Nov 8, 2016

Hi @edenhill, I've made most of the changes you requested, but don't have too much time to work on the very specifics especially if there's going to be a lot of back and forth. Feel free to make the remaining changes yourself if you want.

Some thoughts/issues:

  • You do not have any unit tests that actually get a Message object, so adding that is a much larger change. I got _Consumer working but that meant commenting out the .unsubscribe line and tweaking it a lot.
  • This package has minimal documentation already. I had to browse the source to figure out how to use it, I don't know where you're expecting me to add documentation.
  • I don't believe converting it to a float is future safe. If you're expecting other uses of the timestamp field based on tstype they might not all be milliseconds so / 1000 isn't a safe assumption. I decided to leave as is with the tuple.
  • I made the api return the tuple even if tstype=0. Might want to consider returning None in that case, but that will cause (tstype, timestamp) = msg.timestamp() to fail.

@qix
Copy link
Contributor Author

qix commented Nov 8, 2016

p.s. We released https://github.com/smyte/kafka_store today which uses a fork based on my original commit. As soon as we get this merged in and a new release is out I'll remove the fork and move it back over.

@edenhill edenhill changed the base branch from master to PR50_msg_timestamps November 9, 2016 12:34
@edenhill edenhill merged commit 2252d24 into confluentinc:PR50_msg_timestamps Nov 9, 2016
@edenhill
Copy link
Contributor

edenhill commented Nov 9, 2016

Thanks a lot for your effort!

edenhill added a commit that referenced this pull request Nov 9, 2016
PR #50 message timestamps - additional docs, etc
dtheodor pushed a commit to dtheodor/confluent-kafka-python that referenced this pull request Sep 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants