Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument/metric improvements #4249

Merged
merged 9 commits into from
Mar 26, 2024

Conversation

chrzaszcz
Copy link
Member

@chrzaszcz chrzaszcz commented Mar 22, 2024

This PR groups several improvements of metric handling in mongoose_instrument.

Changes:

  • Added support for the all_metrics_are_global option in instrumentation.exometer.
  • Event handler failures are now caught and logged to isolate them from the instrumented code.
  • Already existing metrics are reset in set_up.
  • Prometheus counters are initialised with zero to avoid the issue with initial rate calculation in metric graphs.
    • Histograms are not initialised like this, because if there is no data, quantiles cannot be calculated, and zeroes would be technically incorrect.
  • Optional start/0 and stop/0 in the mongoose_instrument behaviour.
  • Prometheus metrics use string for keys to avoid list_to_atom.
  • Removed dependency on mongoose_metrics.

@chrzaszcz chrzaszcz changed the base branch from master to feature/instrument March 22, 2024 15:46
@mongoose-im
Copy link
Collaborator

mongoose-im commented Mar 22, 2024

elasticsearch_and_cassandra_26 / elasticsearch_and_cassandra_mnesia / a682886
Reports root/ big
OK: 435 / Failed: 0 / User-skipped: 41 / Auto-skipped: 0


small_tests_26 / small_tests / a682886
Reports root / small


small_tests_25 / small_tests / a682886
Reports root / small


small_tests_26_arm64 / small_tests / a682886
Reports root / small


ldap_mnesia_25 / ldap_mnesia / a682886
Reports root/ big
OK: 2275 / Failed: 0 / User-skipped: 895 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_25 / pgsql_mnesia / a682886
Reports root/ big
OK: 4522 / Failed: 0 / User-skipped: 105 / Auto-skipped: 0


dynamic_domains_mysql_redis_26 / mysql_redis / a682886
Reports root/ big
OK: 4489 / Failed: 0 / User-skipped: 138 / Auto-skipped: 0


ldap_mnesia_26 / ldap_mnesia / a682886
Reports root/ big
OK: 2275 / Failed: 0 / User-skipped: 895 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_26 / pgsql_mnesia / a682886
Reports root/ big
OK: 4522 / Failed: 0 / User-skipped: 105 / Auto-skipped: 0


internal_mnesia_26 / internal_mnesia / a682886
Reports root/ big
OK: 2415 / Failed: 0 / User-skipped: 755 / Auto-skipped: 0


pgsql_cets_26 / pgsql_cets / a682886
Reports root/ big
OK: 4439 / Failed: 0 / User-skipped: 174 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_26 / odbc_mssql_mnesia / a682886
Reports root/ big
OK: 4519 / Failed: 0 / User-skipped: 108 / Auto-skipped: 0


mysql_redis_26 / mysql_redis / a682886
Reports root/ big
OK: 4889 / Failed: 1 / User-skipped: 133 / Auto-skipped: 0

carboncopy_SUITE:one2one:dropped_client_doesnt_create_duplicate_carbons
{error,
  {{badmatch,
     [{xmlel,<<"message">>,
        [{<<"from">>,
        <<"alice_dropped_client_doesnt_create_duplicate_carbons_586@localhost">>},
         {<<"to">>,
        <<"alice_dropped_client_doesnt_create_duplicate_carbons_586@localhost/res2">>},
         {<<"xmlns">>,<<"jabber:client">>},
         {<<"type">>,<<"chat">>}],
        [{xmlel,<<"sent">>,
           [{<<"xmlns">>,<<"urn:xmpp:carbons:2">>}],
           [{xmlel,<<"forwarded">>,
            [{<<"xmlns">>,<<"urn:xmpp:forward:0">>}],
            [{xmlel,<<"message">>,
               [{<<"from">>,
                 <<"alice_dropped_client_doesnt_create_duplicate_carbons_586@localhost/res1">>},
                {<<"type">>,<<"chat">>},
                {<<"to">>,
                 <<"bob_dropped_client_doesnt_create_duplicate_carbons_586@localhost/res1">>},
                {<<"xmlns">>,<<"jabber:client">>}],
               [{xmlel,<<"body">>,[],
                  [{xmlcdata,
                     <<"And pious action">>}]}]}]}]}]}]},
   [{carboncopy_SUITE,
      '-dropped_client_doesnt_create_duplicate_carbons/1-fun-0-',4,
      [{file,
         "/home/circleci/project/big_tests/tests/carboncopy_SUITE.erl"},
       {line,189}]},
    {escalus_story,story,4,
      [{file,
         "/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
       {line,72}]},
    {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1793}]},
    {test_server,run_test_case_eval1,6,
      [{file,"test_serv...

Report log


pgsql_mnesia_25 / pgsql_mnesia / a682886
Reports root/ big
OK: 4911 / Failed: 0 / User-skipped: 112 / Auto-skipped: 0


pgsql_mnesia_26 / pgsql_mnesia / a682886
Reports root/ big
OK: 4911 / Failed: 0 / User-skipped: 112 / Auto-skipped: 0


mssql_mnesia_26 / odbc_mssql_mnesia / a682886
Reports root/ big
OK: 4908 / Failed: 0 / User-skipped: 115 / Auto-skipped: 0

Copy link

codecov bot commented Mar 22, 2024

Codecov Report

Attention: Patch coverage is 97.95918% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 84.51%. Comparing base (0912480) to head (b768315).

Files Patch % Lines
src/instrument/mongoose_instrument.erl 95.00% 1 Missing ⚠️
Additional details and impacted files
@@                  Coverage Diff                   @@
##           feature/instrument    #4249      +/-   ##
======================================================
+ Coverage               84.23%   84.51%   +0.27%     
======================================================
  Files                     556      556              
  Lines                   33660    33695      +35     
======================================================
+ Hits                    28354    28476     +122     
+ Misses                   5306     5219      -87     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

- Failing handlers shouldn't cause the instrumented code to crash
- Subsequent handlers should still be executed
- There is no removal of broken handlers (like telemetry),
  because that would put the system in an inconsistent state if a
  handler fails intermittently.
The goal is to allow automatic initialisation/cleanup.
@chrzaszcz chrzaszcz force-pushed the instrument/metric-improvements branch from a682886 to fb71715 Compare March 22, 2024 17:56
@mongoose-im
Copy link
Collaborator

mongoose-im commented Mar 22, 2024

elasticsearch_and_cassandra_26 / elasticsearch_and_cassandra_mnesia / fb71715
Reports root/ big
OK: 435 / Failed: 0 / User-skipped: 41 / Auto-skipped: 0


small_tests_25 / small_tests / fb71715
Reports root / small


small_tests_26 / small_tests / fb71715
Reports root / small


small_tests_26_arm64 / small_tests / fb71715
Reports root / small


ldap_mnesia_25 / ldap_mnesia / fb71715
Reports root/ big
OK: 2275 / Failed: 0 / User-skipped: 895 / Auto-skipped: 0


ldap_mnesia_26 / ldap_mnesia / fb71715
Reports root/ big
OK: 2275 / Failed: 0 / User-skipped: 895 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_25 / pgsql_mnesia / fb71715
Reports root/ big
OK: 4522 / Failed: 0 / User-skipped: 105 / Auto-skipped: 0


dynamic_domains_mysql_redis_26 / mysql_redis / fb71715
Reports root/ big
OK: 4488 / Failed: 1 / User-skipped: 138 / Auto-skipped: 0

carboncopy_SUITE:one2one:dropped_client_doesnt_create_duplicate_carbons
{error,
  {{badmatch,
     [{xmlel,<<"message">>,
        [{<<"from">>,
        <<"alice_dropped_client_doesnt_create_duplicate_carbons_586@domain.example.com">>},
         {<<"to">>,
        <<"alice_dropped_client_doesnt_create_duplicate_carbons_586@domain.example.com/res2">>},
         {<<"xmlns">>,<<"jabber:client">>},
         {<<"type">>,<<"chat">>}],
        [{xmlel,<<"sent">>,
           [{<<"xmlns">>,<<"urn:xmpp:carbons:2">>}],
           [{xmlel,<<"forwarded">>,
            [{<<"xmlns">>,<<"urn:xmpp:forward:0">>}],
            [{xmlel,<<"message">>,
               [{<<"from">>,
                 <<"alice_dropped_client_doesnt_create_duplicate_carbons_586@domain.example.com/res1">>},
                {<<"type">>,<<"chat">>},
                {<<"to">>,
                 <<"bob_dropped_client_doesnt_create_duplicate_carbons_586@domain.example.com/res1">>},
                {<<"xmlns">>,<<"jabber:client">>}],
               [{xmlel,<<"body">>,[],
                  [{xmlcdata,
                     <<"And pious action">>}]}]}]}]}]}]},
   [{carboncopy_SUITE,
      '-dropped_client_doesnt_create_duplicate_carbons/1-fun-0-',4,
      [{file,
         "/home/circleci/project/big_tests/tests/carboncopy_SUITE.erl"},
       {line,189}]},
    {escalus_story,story,4,
      [{file,
         "/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
       {line,72}]},
    {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1793}]},
    {test_server,run_test_c...

Report log


dynamic_domains_pgsql_mnesia_26 / pgsql_mnesia / fb71715
Reports root/ big
OK: 4522 / Failed: 0 / User-skipped: 105 / Auto-skipped: 0


internal_mnesia_26 / internal_mnesia / fb71715
Reports root/ big
OK: 2415 / Failed: 0 / User-skipped: 755 / Auto-skipped: 0


pgsql_mnesia_25 / pgsql_mnesia / fb71715
Reports root/ big
OK: 4911 / Failed: 0 / User-skipped: 112 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_26 / odbc_mssql_mnesia / fb71715
Reports root/ big
OK: 4519 / Failed: 0 / User-skipped: 108 / Auto-skipped: 0


pgsql_cets_26 / pgsql_cets / fb71715
Reports root/ big
OK: 4439 / Failed: 0 / User-skipped: 174 / Auto-skipped: 0


mysql_redis_26 / mysql_redis / fb71715
Reports root/ big
OK: 4890 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0


pgsql_mnesia_26 / pgsql_mnesia / fb71715
Reports root/ big
OK: 4911 / Failed: 0 / User-skipped: 112 / Auto-skipped: 0


mssql_mnesia_26 / odbc_mssql_mnesia / fb71715
Reports root/ big
OK: 4908 / Failed: 0 / User-skipped: 115 / Auto-skipped: 0

This way start and stop operations are more organised,
and the functionality is tested.
- The goal is to use them in the handler modules.
- label_key and label_value are a bit repetitive, but this way the specs
  are as strict as possible. If the list grows longer, we could make
  the type more generic instead of enumerating all values.
1. Support the 'all_hosts_are_global' option
2. Move prefix management from mongoose_metrics to eliminate
   the dependency on legacy code.
3. Reset already existing metrics on setup.
   The goal is to have fresh repeatable metric state if you restart
   mongoose_instrument in tests.
   There is no metric removal, because it doesn't seem to be needed.
More initialisation is needed because of host-type prefix handling.
1. Add type specs.
2. Use strings as metric names to avoid calling 'list_to_atom'.
   This doesn't seem to have a big performance penalty.
3. Reset already existing metrics on startup, just like for exometer.
4. Initialise counters with zero - previous initial value was 'undefined',
   resulting in a delay in initial rate metric calculation in Prometheus.
Metric name is now a string, not an atom.
@chrzaszcz chrzaszcz force-pushed the instrument/metric-improvements branch from fb71715 to b768315 Compare March 25, 2024 08:00
@mongoose-im
Copy link
Collaborator

mongoose-im commented Mar 25, 2024

elasticsearch_and_cassandra_26 / elasticsearch_and_cassandra_mnesia / b768315
Reports root/ big
OK: 435 / Failed: 0 / User-skipped: 41 / Auto-skipped: 0


small_tests_25 / small_tests / b768315
Reports root / small


small_tests_26 / small_tests / b768315
Reports root / small


small_tests_26_arm64 / small_tests / b768315
Reports root / small


ldap_mnesia_25 / ldap_mnesia / b768315
Reports root/ big
OK: 2275 / Failed: 0 / User-skipped: 895 / Auto-skipped: 0


ldap_mnesia_26 / ldap_mnesia / b768315
Reports root/ big
OK: 2275 / Failed: 0 / User-skipped: 895 / Auto-skipped: 0


dynamic_domains_mysql_redis_26 / mysql_redis / b768315
Reports root/ big
OK: 4489 / Failed: 0 / User-skipped: 138 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_25 / pgsql_mnesia / b768315
Reports root/ big
OK: 4522 / Failed: 0 / User-skipped: 105 / Auto-skipped: 0


internal_mnesia_26 / internal_mnesia / b768315
Reports root/ big
OK: 2415 / Failed: 0 / User-skipped: 755 / Auto-skipped: 0


pgsql_mnesia_25 / pgsql_mnesia / b768315
Reports root/ big
OK: 4911 / Failed: 0 / User-skipped: 112 / Auto-skipped: 0


pgsql_cets_26 / pgsql_cets / b768315
Reports root/ big
OK: 4439 / Failed: 0 / User-skipped: 174 / Auto-skipped: 0


pgsql_mnesia_26 / pgsql_mnesia / b768315
Reports root/ big
OK: 4911 / Failed: 0 / User-skipped: 112 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_26 / odbc_mssql_mnesia / b768315
Reports root/ big
OK: 4519 / Failed: 0 / User-skipped: 108 / Auto-skipped: 0


mysql_redis_26 / mysql_redis / b768315
Reports root/ big
OK: 4890 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0


mssql_mnesia_26 / odbc_mssql_mnesia / b768315
Reports root/ big
OK: 4908 / Failed: 0 / User-skipped: 115 / Auto-skipped: 0

@chrzaszcz chrzaszcz marked this pull request as ready for review March 25, 2024 09:20
Copy link
Collaborator

@NelsonVides NelsonVides left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks excellent! Only have one question in regards to performance, but I could generally approve and merge already 👌🏽

full_metric_name(EventName, MetricName) ->
list_to_atom(atom_to_list(EventName) ++ "_" ++ atom_to_list(MetricName)).
atom_to_list(EventName) ++ "_" ++ atom_to_list(MetricName).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in comparison to the previous code is pretty much having the same performance, true, but, here on every metric update we're concatenating the same two lists again and again. Is this necessary? Maybe the metric name can be [EventName, MetricName]? Or some similar structure that implies no copying? Just asking out loud, I don't know the required API for the prometheus library here.

Copy link
Member Author

@chrzaszcz chrzaszcz Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prometheus library accepts only atoms or flat strings. Any changes would require modifying prometheus.erl.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so unfortunate 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just how Prometheus metric naming works - so for me it makes sense that the library doesn't give an illusion of supporting lists.

Copy link
Contributor

@JanuszJakubiec JanuszJakubiec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Copy link
Collaborator

@NelsonVides NelsonVides left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌🏽

@NelsonVides NelsonVides merged commit 2bd8689 into feature/instrument Mar 26, 2024
4 checks passed
@NelsonVides NelsonVides deleted the instrument/metric-improvements branch March 26, 2024 08:57
@jacekwegr jacekwegr added this to the 6.3.0 milestone Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants