Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Monitor Plugin #188

Closed
wants to merge 48 commits into from

Conversation

meevee98
Copy link

@meevee98 meevee98 commented Mar 3, 2020

This is a plugin created to monitor some aspects of Neo.
The metrics that the plugin measures are the ones discussed in neo#920

@cloud8little
Copy link
Contributor

cloud8little commented Mar 9, 2020

@meevee98 Gread work! But there are some issues need to be fixed.
test commit : 45b10a0 with
neo-cli: master e5400583a41bd19d4f2db5e05ca59a6c04160853
neo: master ff850c5e170a8a132132de9aea6168b0668125aa
neo-vm: master 754bca3568c50695c571cb796c500fd66c07d0fe

  • block time <index/hash>
    current block height is 310, result of block time 301, 303 is incorrect.
neo> block time 301
Block Hash: 0x1642a8e70d21a1651cba7fd9f22e4d6d05b69a93fe09119bc753a082db5194f7
      Index: 301
      Time: 127 seconds
neo> block time 302
Block Hash: 0x35716ca313de935d230d6e5959028a98697374918b56743a95c0eeadd7f1f3ad
      Index: 302
      Time: 15 seconds
neo> block time 303
Block Hash: 0x5367caa58473c2d82fdf340d715e9381d612f180fd8cd95f254ee4467072e276
      Index: 303
      Time: 303 seconds
  • When the height increased to 550, block avgtime 550, the result is incorrect.
neo> block avgtime 7
Average time/block: 15.06 seconds
neo> block avgtime 8
Average time/block: 15.06 seconds
neo> block avgtime 9
Average time/block: 15.06 seconds
neo> block avgtime 30
Average time/block: 18.77 seconds
neo> block avgtime 100
Average time/block: 16.76 seconds
neo> block avgtime 101
Average time/block: 16.75 seconds
neo> block avgtime 102
Average time/block: 16.75 seconds
neo> block avgtime 103
Average time/block: 16.75 seconds
neo> block avgtime 104
Average time/block: 16.73 seconds
neo> block avgtime 550
Average time/block: 1112.98 seconds
neo> block avgtime 551
Average time/block: 1112.98 seconds
  • check disk
    Both on Win10/Ubuntu, check disk return error: Performance counters are not supported on this platform.
    d36d1e9add92e1713326c5425ddd934

  • check memory
    on Ubuntu 18.04, check memory return 0.00 MB
    1dda5650d024a9a08fb7caeff885f2f

  • tx avgsize [1 - 10000]
    There are total 2 txs for the test private chain. while tx avgsize 3 is not equal to tx avgsize 2.
    e7d317ae6d5076df7ab90c95a7bfb7a
    416a18157d59af8b6c901d771caed88

  • confirmation time & payload time

always return Timeout.

neo> confirmation time
Waiting for the next commit...
Timeout
neo> confirmation time
Waiting for the next commit...
Timeout
neo> payload time
Waiting for the next payload...
Timeout
  • Network Commands: connected, when current height update, height in the result of "connected" does not update. for example, height is always 578, while getconnectioncount is 580.
neo> connected
Connected nodes: 1
  ip: 127.0.0.1         height: 578

@meevee98
Copy link
Author

meevee98 commented Mar 9, 2020

Thank you for your tests @cloud8little

  • block time <index/hash>
    current block height is 310, result of block time 301, 303 is incorrect.
  • When the height increased to 550, block avgtime 550, the result is incorrect.

I couldn't reproduce, can you give me more details?

  • check disk
    Both on Win10/Ubuntu, check disk return error: Performance counters are not supported on this platform.

For this to work, you need to add System.Diagnostics.PerformanceCounter as a dependency in neo-cli, otherwise it won't not work, even if you include the dll in the folder.
I've added it in neo-node#549

  • confirmation time & payload time
    always return Timeout.

Confirmation and payload time only work after start consensus, that's why it always returns a timeout. I've changed the printed message to make this condition clearer.

@bettybao1209
Copy link
Contributor

@meevee98 hi, I wrote a readme giving the description about the commands of the plugin, as well as some notable points to ease the use. Hope to have your opinion.

@cloud8little
Copy link
Contributor

cloud8little commented Mar 11, 2020

@meevee98

  1. you can try "block time 147" in this private chain.
    block-time-147-incorrect.zip

1fe5287b528270a3ed9aac5e95c6a17

  1. already add this reference. Update dependencies for Performance Plugin neo-node#549, while the result is the same: error: Performance counters are not supported on this platform.

  2. the same chain with 1, you can wait the block height to 170+.
    dde3afe18b9f116cefcd1f1608b1e69

@meevee98
Copy link
Author

@meevee98 hi, I wrote a readme giving the description about the commands of the plugin, as well as some notable points to ease the use. Hope to have your opinion.

Thank you for your help @bettybao1209. I think the readme is very good.

  1. you can try "block time 147" in this private chain.
    block-time-147-incorrect.zip

1fe5287b528270a3ed9aac5e95c6a17

I investigated the problem and I think this is not wrong at all.
Using the show block command from this pull request to show the information of the block, we can see that there is a more than 4 minutes gap between the blocks 147 and 148.

image

That's why the result of block time was 270 seconds (4 minutes and 30 seconds). Also, that's why I think it's not the wrong result: this result means that block 148 delayed, so the 147 remained active longer than it should.

  1. the same chain with 1, you can wait the block height to 170+.
    dde3afe18b9f116cefcd1f1608b1e69

For this, the problem was that some blocks had a much longer time in seconds, increasing the average. I found that these three blocks had a time greater than 4 hours, with the longest between blocks 100 and 101, with a gap of almost 2 days:

image
image

Probably that happened because the blockchain was paused and continued later.

I'm still working on the problem with Performance Counter.

Copy link
Member

@vncoelho vncoelho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I see that new performance indicators were added.

@shargon
Copy link
Member

shargon commented Jun 16, 2020

@meevee98 it's this solved? #188 (comment)

@meevee98 meevee98 requested a review from shargon July 3, 2020 19:15
[ConsoleCommand("block avgtime", Category = "Block Commands", Description = "Show the average time in seconds the latest blocks are active.")]
private void OnBlockAverageTimeCommand(uint blockCount = 1000)
{
uint desiredCount = blockCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse blockCount

throw new RpcException(-100, "Minimum 1 block");
}

if (desiredCount > 10000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much time spend with this limit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using RPC, I got 889 ms in total time, with this method using 168 ms in the RPC Server
image
image
Do you think we should decrease this maximum as well?

[ConsoleCommand("block timestamp", Category = "Block Commands", Description = "Show the block timestamp for each of the n latest blocks.")]
private void OnBlockTimestampCommand(uint blockCount)
{
uint desiredCount = blockCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse blockCount

uint maxNBlocksPerDay = 24 * 60 * 60 / (Blockchain.MillisecondsPerBlock / 1000);
if (desiredCount > maxNBlocksPerDay)
{
throw new RpcException(-100, maxNBlocksPerDay.ToString("Maximum 0 blocks"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was the format syntax. I changed it to make clearer what the message should be.

using (var snapshot = Blockchain.Singleton.GetSnapshot())
{
var block = GetBlock(height.ToString());
var blocksTime = snapshot.GetBlocksTimestamp(desiredCount, block);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test the time it spend with the maximum values?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested the cli's method that returns the same information. The cli's method is a bit lazier than the rpc's because of the printed messages. In my PC I got around 0.5 seconds for getting and printing the information.
image
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it should be lower than 1000 in rpc, in console it's ok

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I have no problem decreasing it.

/// <param name="height">
/// The block height to be passed in the ping message
/// </param>
private void SendBlockchainPingMessage(uint height)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should send the ping always with our hight, why it's an argument?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used in the block sync command. In the case that the local node is not updated, the ping message sends the max height of the remote nodes.
It was needed before the implementation of the ping command, but now it can just send the ping with the current height. I will change it.


try
{
client.GetBlockCount();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaxTimeout? what's happend if the server never answer and it doesn't close the connection?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's handled by SendAsync method of RpcClient. PostAsync throws HttpRequestException on timeout. ref

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default it's 100 seconds https://docs.microsoft.com/en-us/dotnet/api/system.net.http.httpclient.timeout?view=netcore-3.1 I think that it should be 30 or less

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point.
I'm going to set the timeout in the method then.

@meevee98 meevee98 requested a review from shargon July 9, 2020 20:33
@shargon
Copy link
Member

shargon commented Jul 10, 2020

@neo-project/ngd-shanghai could someone test it?

@superboyiii
Copy link
Member

@neo-project/ngd-shanghai could someone test it?

@cloud8little will help test this.

@cloud8little
Copy link
Contributor

  1. help message: PASS

c7f317a4ac8564ed66fec0c8222cfed

  1. block avgtime PASS
    11853654607fd473973d7ac141a4420

  2. block sync PASS
    2accd891d177fad985a9d6d9902127d

f78a55606a34f5ec9b8878a4bf9f6c8

  1. block time PASS
    38ad22cbe29de83871f21ee0cd7fcf5

  2. block timesincelast PASS
    7ebd72f40b25b1f80bf663edb0a535a

  3. block timestamp PASS
    1b5af0092fddf24afa92253c3f89f89

  4. connected PASS
    475adbf7fb0ea9bf808c6b174e83c6e

  5. ping PASS
    def0f32e4efb0ccb7de6086a56327f9

  6. rpc time PASS
    796bf5e4ee3964a127269cf1bf3b944

  7. check state PASS
    a4295bd6137881322a81a0f0ec94c82

  8. tx avgsize xx PASS
    8ae389c7cc6c18331df224f1655448f

  9. tx size PASS
    d2e246e4997c404b0d9df9da57810e0

  10. without RpcServer PASS
    4ad74659eac8a680d430a89c68f1b56

  11. commit time/confirmation time/payload time @meevee98 seems when starting a consensus node, payload/confirmation always timeout. when starting consensus as watch only, it returns value.
    2cabb3caa8322c7d5df5c3c946bcbd8
    66fedc63bb903cb3ceb843bec6b0c26

@cloud8little
Copy link
Contributor

@meevee98 could you help answer the #14 please? #188 (comment)

@lock9
Copy link
Contributor

lock9 commented Dec 15, 2023

Hi @Liaojinghui,
@meevee98 is my colleague. Do you guys still have interest in this PR? I don't know why it was forgotten 😅

@meevee98 could you help answer the #14 please? #188 (comment)

Is this still relevant?

@cschuchardt88
Copy link
Member

cschuchardt88 commented Dec 15, 2023

it doesnt support other OS, like linux. And never fixed anything, from what i can see.

@Jim8y Jim8y added the Need Active Pr will be closed after one week if no new activity. label Feb 12, 2024
@Jim8y
Copy link
Contributor

Jim8y commented Feb 17, 2024

close as inactive

@Jim8y Jim8y closed this Feb 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Need Active Pr will be closed after one week if no new activity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.