-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP]Fix the error "Field “xxx” does not exist." when data is imported from Kafka to Nebula Graph via Nebula Exchange #8
Comments
We are working on this issue, the fix plan is:
|
Thanks very mach for your solution. There are two small questions we want to discuss with you:
|
|
That‘s reasonable, Iooking forward to your pr ~ |
I am sorry that I can not reply in this pr vesoft-inc/nebula-spark-utils#157. Sorry about my poor English. I am not very sure what "can we just modify the StreamingReader to parse the kafka's value to DataFrame" means. Does it mean we just modify the input logic and try to transform the Kafka data to DataFrame while maintain the process logic as before, i.e. get DataFrame from Kafka source and parse it in vertice/edgeProcessor separately? If we want to implement it, the first question is how to switch vertices/edgeProcessor.
|
Yes, my point is
|
That's OK. However, since the Kafka producer produces data all the time typically, it is expected for the Kafka consumer to consume data all the time as well. Hence, when one of the data source of tag/edge defined in configuration is kafak, the nebula-exchange application will only process that tag/edge forever which will not switch to any other tag/edge defined in the same configuration. Hence, in the new pr, we will make the following restrictions:
Implement summary: Expectation effect: |
So sorry for reply late.
|
|
In the Reader process, we can use the config assume flieds config is:[id,name,age] (all fields must exist in kafka's value data)
Then we get the needed DataFrame from kafka, and other process keep the same with current logic. |
I get your point. I will try it like df.select(value).map(v => parseJson) |
#29 resolved it. |
The bug was reported by a user on the Chinese forum: https://discuss.nebula-graph.com.cn/t/topic/2623/
For those who cannot understand Chinese well, please refer to the title of the issue for a basic background. The reason why the error occurs is that Nebula Exchange is not able to parse the "value" field of the data in Kafka.
Committer @guojun85 is working on this now. Thanks for his contribution in advance!
The text was updated successfully, but these errors were encountered: