-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TCP source and TCP sink to simplify steps to test simple topology #1230
Comments
I propose we define a source and sink API in SAM itself and build the TCP source and sink on top of that. Right now there are multiple steps required to add even a simple source like adding the corresponding spout, defining the flux translation, UI component definition etc.
This would be more useful for testing the real flow in the cluster than test mode. For test mode we could inject the data in Json and test the flow. Once we decouple the environment from test mode, it would become even more simpler.
I think we should decouple the parsing step from the source itself. Otherwise the scope of the source gets narrow (e.g. it could process only Avro, CSV, JSON etc). We could even have some generic parsers (avro, csv, json etc) that could be attached to any source instead of adding the parsing logic to each source.
Yes we should provide the flexibility to either manage the schema via registry or have users define the output fields in the component (for e.g. like how we allow in the custom processor). |
Intentionally adjusting the sequence: About TCP source / sink:
Another intention of TCP source/sink is that they don't require any external service, so end-to-end topology can be composed with test environment. Currently source and sink can't be defined without coupling with external service, so even with test environment it is forced to import or clone existing app. About supporting source and sink API:
Yes strongly agreed. Ideally we should provide the set of public API in SDK for Btw, please note the rationalize of this issue. This issue intends to address current lack of testing environment, so the requirement of this issue should be simpler to get it earlier than later. Source and sink API are like new feature and it will need much effort to put. I also describe the hard part of defining public API on source and sink above. I agree the steps to add will be annoying: if we see the benefit of providing TCP source and sink to the end-user, the steps should not be forced to end-user. If we don't want to expose them by default, then we could just create a script to register only TCP source and sink, and we can execute the script to register them if we need. |
Any testing cases (not meant to test mode) require setup with external storage (for source/sink) and also Schema Registry. Whenever I spin the cluster for testing, I need to…
Which test I would like to do doesn’t matter. Any small test/verification requires above step. We can’t test/verify issue in test mode since I couldn’t even compose topology app properly before setting up environment. For test mode, there’s a workaround (import existing topology to skip filling information for source/sink) but it requires target topology to be composed and being exported previously.
Even we are doing manual test with local environment via manual cluster, this also requires my local to run at least SR and Kafka as well as Storm, and setup similar step.
We could add TCP source as well as TCP sink to make steps fairly simple.
TCP Source:
TCP Sink:
After adding the source and sink, we just need to save pre-defined events in JSON format (as well as schema JSON for SR if necessary) to file, and setup done. There’s even a combination which completely eliminate the needs to put events to source as well as register schema to SR, if we can define events and schema in source and exporting/importing topology can retain such information.
The text was updated successfully, but these errors were encountered: