Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix influx response format #5932

Closed

Conversation

zhangjianweibj
Copy link
Contributor

@zhangjianweibj zhangjianweibj commented Dec 1, 2020

Fix #5931

@@ -138,6 +138,7 @@ core|default|role|Option values, `Mixed/Receiver/Aggregator`. **Receiver** mode
| - | - | database | Database of InfluxDB. | SW_STORAGE_INFLUXDB_DATABASE | skywalking |
| - | - | actions | The number of actions to collect. | SW_STORAGE_INFLUXDB_ACTIONS | 1000 |
| - | - | duration | The time to wait at most (milliseconds). | SW_STORAGE_INFLUXDB_DURATION | 1000|
| - | - | responseFormat | Influxdb response format option. | SW_STORAGE_INFLUXDB_RESPONSE_FORMAT | MSGPACK|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What options do they have? Could you add a reference link?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find some issues may affect the SkyWalking, such as

JSON does not support integers above 2^53

SkyWalking has many long fields, which in Java could be 2^63-1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmsolr told me, there would be issues when using JSON. Same reason as the above number format issue.

@wu-sheng wu-sheng requested a review from dmsolr December 1, 2020 13:25
@wu-sheng wu-sheng added backend OAP backend related. enhancement Enhancement on performance or codes labels Dec 1, 2020
@wu-sheng
Copy link
Member

wu-sheng commented Dec 1, 2020

One question for the context about the repo, https://github.com/toni-moreno/influxdb-srelay. Is this under maintaining fork version?
It makes me feel strange, why you are using a fork version of a fork version. What is the status of https://github.com/influxdata/influxdb-relay?

The reason I am asking is, as an Apache project, we usually only support widely used thing, the 2 PRs you submitted are acceptable, so we could merge. But if this kind of thing keeps happening, I have concerns we will face a conflict.

Could you explain more to make us following the context,?

@zhangjianweibj
Copy link
Contributor Author

https://github.com/influxdata/influxdb-relay is not a cluster solution,we used for a year and encountered many serious problems (eg two influx nodes frequent data loss,if one node breakdown then start,data will be permanently missing ).

@wu-sheng
Copy link
Member

wu-sheng commented Dec 1, 2020

https://github.com/influxdata/influxdb-relay is not a cluster solution, we used for a year and encountered many serious problems

My point, and concerns from ASF TLP, are both whether the project you choose is really widely used? And who and which community could make the project usable and maintainable for a long time?

@codecov
Copy link

codecov bot commented Dec 1, 2020

Codecov Report

Merging #5932 (afcf52a) into master (8e2e9df) will decrease coverage by 5.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #5932      +/-   ##
============================================
- Coverage     51.19%   46.17%   -5.02%     
+ Complexity     3477     3081     -396     
============================================
  Files          1661      881     -780     
  Lines         35497    21903   -13594     
  Branches       3901     2124    -1777     
============================================
- Hits          18171    10114    -8057     
+ Misses        16417    10933    -5484     
+ Partials        909      856      -53     
Impacted Files Coverage Δ Complexity Δ
...p/server/storage/plugin/influxdb/InfluxClient.java 51.94% <100.00%> (-3.32%) 13.00 <1.00> (-2.00)
...r/storage/plugin/influxdb/InfluxStorageConfig.java 100.00% <100.00%> (ø) 1.00 <0.00> (ø)
...kywalking/oap/server/core/storage/AbstractDAO.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-2.00%)
...g/oap/server/core/cluster/ClusterHealthStatus.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (ø%)
...p/server/core/analysis/metrics/MinLongMetrics.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (ø%)
...er/exporter/provider/grpc/GRPCExporterSetting.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (ø%)
...ement/ui/template/UITemplateManagementService.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-6.00%)
...king/oap/server/configuration/api/ConfigTable.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (ø%)
...skywalking/oap/server/receiver/envoy/als/Role.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (ø%)
...erver/storage/plugin/elasticsearch/base/EsDAO.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (-4.00%)
... and 1199 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e2e9df...bb7ab78. Read the comment docs.

@zhangjianweibj
Copy link
Contributor Author

yeah,thanks for your suggestion.before skywalking project,we used prometheus with influxdb many years.during this period,influxdb-relay encountered many serious problems.we hava to look for alternatives solution(uber/m3,victoriaMetrics,srelay).as your point,srelay maintenance is a important issue.there are not many maintainers of the project so far.but influxdb with srelay is a reasonable solution util now.this solution also presented to the community.prometheus has many reasonable solution like m3,victoriaMetrics,thanos.we tried various skywalking storage plugins.tsdb is the most reasonable option so far.

@wu-sheng
Copy link
Member

wu-sheng commented Dec 2, 2020

prometheus has many reasonable solution like m3,victoriaMetrics,thanos.we tried various skywalking storage plugins.tsdb is the most reasonable option so far.

For Prometheus, yes, TSDB makes sense. But honestly, I am not sure whether TSDB is the best option for SkyWalking.

@wu-sheng
Copy link
Member

wu-sheng commented Dec 2, 2020

srelay maintenance is a important issue

This would be a big issue, we can't accept to make all things configurable but just for a lack of maintenance project.

@zhangjianweibj
Copy link
Contributor Author

not only srelay use JSON format as response format.native influxdb also support JSON format.

@zhangjianweibj
Copy link
Contributor Author

elasticsearch is not a reasonable solution for skywalking.agent may push many metrics,each metric has a indics with timestamp 'yyyy-mm-dd'.so many indics and shards that elasticsearch can not manage.on the other hand,the reasonable size of shards in es is 20G-50G. and skywalking storage size is difficult to estimate.how to set configuration of elasticsearch index?
skywalking used many time series data.obviously this data is not suitable for storage in a realtional database like pgsql,h2.

@wu-sheng
Copy link
Member

wu-sheng commented Dec 2, 2020

elasticsearch is not a reasonable solution for skywalking.agent may push many metrics,each metric has a indics with timestamp 'yyyy-mm-dd'.so many indics and shards that elasticsearch can not manage

I am not sure why you say ES can't manage them. 100+TB even PB+ level storages are common the user environments, as far as I know. They may try to find the way to make the balance better in the cluster, but don't have the management issue.

not only srelay use JSON format as response format.native influxdb also support JSON format.

I was not blaming using JSON. I know InfluxDB is supporting it. My feedback is from the original author of this feature, @dmsolr , he told me there would be issues using this format.

@zhangjianweibj
Copy link
Contributor Author

image

@dmsolr hello,we deploy skywalking with influxdb response json format,but no any error.

@dmsolr
Copy link
Member

dmsolr commented Dec 2, 2020

Hi @zhangjianweibj , thanks!
I am meeting many strange issues, such as typecast error, during developed it.
In my idea, I don't really recommend that. Because none will configure this normally.
Finally, you have to make sure it works well except for errors in logs.

@wu-sheng
Copy link
Member

wu-sheng commented Dec 2, 2020

@zhangjianweibj Could you provide a recorded video to verify all things work? We have several e2e, but they don't cover 100% cases.

@zhangjianweibj
Copy link
Contributor Author

unit tests and integration test are designed to check for such problems.as a apache project,it is hard to accept checking this feature by a recorded video.
very thanks,i will close this pr.

@wu-sheng
Copy link
Member

wu-sheng commented Dec 3, 2020

Sorry, we have to check. Manual tests are always the last option. We have thousands of tests, which definitely helps but we can't say it is perfect. Honestly, I don't understand why a recorded video is an issue, as it is an easy way to check all UI components work well. If we asked you to fill all missing tests in the InfluxDB cases, you would have to suffer more workloads.
But anyway, we respect your choice.

@wu-sheng wu-sheng added this to the 8.4.0 milestone Dec 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend OAP backend related. enhancement Enhancement on performance or codes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

influxdb response format error
3 participants