Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Piping consumer -> producer kafkacat for binary data #142

Open
mjuric opened this issue Jun 6, 2018 · 9 comments
Open

Piping consumer -> producer kafkacat for binary data #142

mjuric opened this issue Jun 6, 2018 · 9 comments

Comments

@mjuric
Copy link

mjuric commented Jun 6, 2018

Hi,
Sometimes (usually for test purposes) I want to quickly copy a certain number of messages from one topic to another. If the messages are text, I can use kafkacat to do something like

kafkacat -C -b localhost -t source -c 10 -e | kafkacat -P -b localhost -t dest

This doesn't seem to work when the messages are binary, though, as any appearance of \n in the message byte stream will be interpreted as a delimiter by the producer.

I tried changing the delimiter to something unlikely to be found in the bytestream (e.g., an equivalent of -D "__ENDMESSAGE__"). The consumer understands this nicely, but it looks like the producer just takes the first character passed with the -D option, not the full string.

My current workaround is to set a delimiter on the consumer end with -D, then pipe the output to awk to write it out into a set of temporary files (one per-message), and then run the producer with the list of those files (see here for the resulting shell script). This requires the files to hit the disk, though, and isn't quite as elegant as it could be.

My question -- is there a better/easier way to do this? If not, are there obstacles to extending -D in producer mode to take a string as a delimiter, rather than a single character?

@ohemelaar
Copy link

I have this issue too, and I've found a PR that is supposed to allow for multi byte delimiters on the consumer side (#150). However, it hasn't been merged yet and for me the fork segfaults.

@thomasklein
Copy link

thomasklein commented Aug 21, 2019

Hi! Thx for sharing the shell script @mjuric . I might have missed something but for me it's not working.

kafkacat -C -e -b localhost -t "$SRC" -D "$DELIM"

I just wonder, if the -D does not support multi-byte delimiters, how would that work?

@mjuric
Copy link
Author

mjuric commented Aug 21, 2019

It supports multi-byte delimiters in consumer mode, but not in when producing. At least that was the state as of a year ago when I wrote the hack, I'm not sure whether something changed in the meantime?

@igorcalabria
Copy link

Maybe a more general solution to this problem is to allow loading/dumping data in base64 format. It is inefficient, but robust and easy to implement.

@AusIV
Copy link

AusIV commented Oct 28, 2020

The other approach I was hoping would work would be to use the JSON wrapper, but the Producer doesn't seem to unwrap from it. If I do:

kafkacat -b mybroker -C -t mytopic -o $START -c $COUNT -J

The binary gets encoded into newline delimited JSON messages. But if I then pipe that into a producer with:

kafkacat -b mybroker -C -t mytopic -o $START -c $COUNT -J | kafkacat -b mybroker -P -t mytopic2 -J

The messages on mytopic2 are still JSON encoded. If the producer honored the -J flag, ignoring the topic, partition, offset, and timestamp while producing based on the decoded key and payload, you'd be able to handle binary data accordingly.

@AusIV
Copy link

AusIV commented Oct 28, 2020

Also, I'm not sure a multicharacter delimiter would be a huge help in my case, as I'm not sure I can identify a sequence of characters that could not appear in my messages.

@JakkuSakura
Copy link

It has been implemented on both consumer and producer as
kafkacat ..... -K "KEY_DELIM" -D "ITEM_DELIM"
check https://github.com/edenhill/kafkacat/blob/master/tests/0002-delim.sh

@JakkuSakura
Copy link

I have created a pull request for this. Vote for me. #295

@MartinSoto
Copy link

Bump. It's now 2023 and this is still not possible? It should be a no-brainer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants