Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

help request: memory growing up #10392

Closed
channer99 opened this issue Oct 25, 2023 · 24 comments · Fixed by #10614
Closed

help request: memory growing up #10392

channer99 opened this issue Oct 25, 2023 · 24 comments · Fixed by #10614
Assignees
Labels
bug Something isn't working

Comments

@channer99
Copy link

channer99 commented Oct 25, 2023

Description

A tendency for memory to gradually increase was confirmed.
checking many issues related to memory(prometheus, ext-plugin, traffic-split, etc...), but i cannot determine the exact cause.

1_memory_trand

The plugin below is used as a global plugin.

  • cors, opentelemetry, prometheus, request_id, response_rewrite

The following plugins were set for individual routes.

  • elasticsearch-logger, ext-plugin-pre-req, limit-count or limit-req, traffic-split

also, traffic-split plugin config is used like #10349(weight : 1 , weight : 0)

my config.yaml is

apisix:
  node_listen:
      - port: 80
  ssl:
    enable: false

deployment:
  role: traditional
  role_traditional:
    config_provider: etcd
  admin:
    allow_admin:
      - all
    admin_listen:
        port: 9180
    admin_key:
      - name: admin
        key: edd1c9f034335f136f87ad84b625c8f1  # using fixed API token has security risk, please update it when you deploy to production environment
        role: admin
  etcd:
    host:
      - "http://127.0.0.1:2379"

nginx_config:  
  enable_cpu_affinity: |
      true;
  http_configuration_snippet: |
      client_body_buffer_size 10m;
      proxy_max_temp_file_size 0;
  http_server_configuration_snippet: |
      merge_slashes off;
      proxy_ignore_client_abort on;
      set $parameter "dev";
  http:
    lua_shared_dict:
      prometheus-metrics: 100m

plugin_attr:
  opentelemetry:
    trace_id_source: x-request-id
    resource:
      service.name: test_GW
    collector:
      address: 192.168.8.211:4318
      request_timeout: 3
      request_headers:
        foo: bar
    batch_span_processor:
      drop_on_queue_full: false
      max_queue_size: 6
      batch_timeout: 2
      inactive_timeout: 1
      max_export_batch_size: 2
  prometheus:
    export_uri: /apisix/metrics
    export_addr:
      port: 8088

ext-plugin:
  cmd: ['java', '-jar', '-Xmx1g', '-Xms1g', '/usr/local/apisix/test.jar', '--paramr=dev']

So I obtained a flamegraph through openresty xray.
Can you check where the memory leak occurs?

2_Lua-Land Memory Leak Flame Graphs
3_LuaJIT GC Object Allocation Flame Graph
4_LuaJIT GC Object Allocation Size Flame Graph
5_LuaJIT String Objects Allocation Flame Graph
6_LuaJIT Table Objects Allocation Flame Graph
7_Lua Memory Realloc Size Flamegraphs
8_GC Object Reference Flame Graph
9_Application-Level Memory Usage Breakdown_10_25
10_Application-Level Memory Usage Breakdown_10_23 png

It cannot be used in the operating environment due to memory increase issues.
need another analyzer(openresty xray) to accurately identify this issues? Please advise.

Environment

  • APISIX version (run apisix version): 3.4.1
  • Operating system (run uname -a): Linux 5.4.0-147-generic
  • OpenResty / Nginx version (run openresty -V or nginx -V): openresty/1.21.4.2
  • etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info): 3.5.0
  • APISIX Dashboard version, if relevant: 3.0.1
  • Plugin runner version, for issues related to plugin runners: java 0.4.0
  • LuaRocks version, for installation issues (run luarocks --version): 3.8.0
@shreemaan-abhishek
Copy link
Contributor

cc: @Sn0rt

@Sn0rt
Copy link
Contributor

Sn0rt commented Oct 26, 2023

Can you continue to observe? It is normal for the memory to grow when NGX is processing requests.

  1. According to my understanding, OOM will occur at some point if the memory leak continues.
  2. NGX has its own memory management mechanism. Use user mode to apply for a large amount of memory and NGX manages it by itself.

From the information in the picture, it is not possible to determine where the memory leak occurred.

@channer99
Copy link
Author

Can you continue to observe? It is normal for the memory to grow when NGX is processing requests.

  1. According to my understanding, OOM will occur at some point if the memory leak continues.
  2. NGX has its own memory management mechanism. Use user mode to apply for a large amount of memory and NGX manages it by itself.

From the information in the picture, it is not possible to determine where the memory leak occurred.

@Sn0rt
Depending on the amount of user calls, the memory increased and decreased, gradually moving upwards (I had already experienced oom). After that, I used openresty xray to check for memory leaks. I don't think there's any point in continuing to watch anymore.

@Sn0rt
Copy link
Contributor

Sn0rt commented Oct 26, 2023

If oom does occur, it is necessary to check. Thank you for your report.

@Revolyssup Revolyssup added the bug Something isn't working label Oct 26, 2023
@t2krew
Copy link

t2krew commented Oct 31, 2023

How's it going? I'm having the same problem.

@Sn0rt
Copy link
Contributor

Sn0rt commented Oct 31, 2023

I can't detect any abnormalities in your xray analysis.

Some users have also reported #10349 memory leaks before.

Combining your information, I very much suspect that the problem was introduced in 3.4; combined with the changelog, I guess it is caused by the modification of etcd. Can you help me test it? (I tried to reproduce it on centos but failed. )

Use the 3.3 etcd file to replace the 3.5 one. Take a look at wget https://raw.githubusercontent.com/apache/apisix/release/3.3/apisix/core/config_etcd.lua

@monkeyDluffy6017
Copy link
Contributor

monkeyDluffy6017 commented Nov 1, 2023

@channer99 Could you send these flame graphs to my email monkeydluffy6017@gmail.com, it's not very clear on the website.

@channer99
Copy link
Author

@channer99 Could you send these flame graphs to my email monkeydluffy6017@gmail.com, it's not very clear on the website.

It doesn't seem to be visible when you click and check it on the web.
I think you can check it faster if you download the image and run the .svg file in your local environment.
@monkeyDluffy6017 thx!

@monkeyDluffy6017
Copy link
Contributor

I didn't find the problem from your graphs, the memory has only increased by 100MB, i think this is normal.
If you think there are some problems, from your graph, we can see that the memory allocated by lua growed the most, it may be a memory fragmentation issue
image

@channer99
Copy link
Author

channer99 commented Nov 7, 2023

What do you think about this issue? @Sn0rt @monkeyDluffy6017
#8461 (comment)

If a req_body request of size 500~2000 is continuously called, and a req_body request of a large size (6,000,000) is suddenly called, the memory increases and then drops, but does not seem to be fully recovered.
is there any correlation with the ext-plugin?
ext-pre-req(java) plugin is set and used as follows.
@Override public Boolean requiredBody() { return true; }

@monkeyDluffy6017
Copy link
Contributor

How about remove the ext-plugin and do a test?

@monkeyDluffy6017
Copy link
Contributor

@channer99 Do you have any progress on this?

@king4sun
Copy link

king4sun commented Dec 7, 2023

I also meet this recently. My apisix app is eating memory slowly day by day. what I get from the metric monitor is a growing straight line. So please tell me is there a tool can get details please and thanks a lot.
image

@monkeyDluffy6017
Copy link
Contributor

@king4sun what's your APISIX version?

@wklken
Copy link

wklken commented Dec 8, 2023

@monkeyDluffy6017

I also met this issue, after deployed online for 2 weeks, we reschedule the pods, then got the chart below. from 3.7G to 6G.

We have no ext-plugins.

the apisix version is 3.2.1

image

@monkeyDluffy6017
Copy link
Contributor

We have located this problem: #10614, it will affect all versions between 3.4.0 and 3.7.0

@wklken
Copy link

wklken commented Dec 8, 2023

also affect 3.2.2; but we use 3.2.1.
I have some suspicion that it is caused by the prometheus plugin, is there any tool to analysis this, while we don't have xray

@monkeyDluffy6017
Copy link
Contributor

@wklken could you open another issue to discuss?

@monkeyDluffy6017
Copy link
Contributor

The problem doesn't exist on version 3.2.2

@wklken
Copy link

wklken commented Dec 8, 2023

@wklken could you open another issue to discuss?

#10618 @monkeyDluffy6017

@wklken
Copy link

wklken commented Dec 8, 2023

The problem doesn't exist on version 3.2.2

the 3.2.2 also merged the pr: #9456;

Was it caused not by the PR, but by the bug-fixing PR that followed?

@monkeyDluffy6017
Copy link
Contributor

I didn't find the pr in the version 3.2.2
image

@wklken
Copy link

wklken commented Dec 8, 2023

here: https://github.com/apache/apisix/blob/release/3.2/CHANGELOG.md#bugfix

image

I think it should not be merged into a PATCH version. We upgraded from 3.2.1 and downgraded because the bug #9951

@monkeyDluffy6017
Copy link
Contributor

I think you are right, it will affect 3.2.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

8 participants