Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue when runs large log database #17

Open
Thusithaaw opened this issue Sep 30, 2024 · 3 comments
Open

Performance issue when runs large log database #17

Thusithaaw opened this issue Sep 30, 2024 · 3 comments

Comments

@Thusithaaw
Copy link

I installed the docker image in a host with 4xCPU and 16GB RAM. And inserted a postfix mail log file size around 250MB. When searching the log database it takes around 20 minutes for a single query. When I checked the CPU utilization I found only 1 CPU out of 4 utilizes 100%. How to optimize the CPU usage to distribute the load in a large log data environment?

@drlight17
Copy link
Owner

drlight17 commented Sep 30, 2024

I installed the docker image in a host with 4xCPU and 16GB RAM. And inserted a postfix mail log file size around 250MB. When searching the log database it takes around 20 minutes for a single query. When I checked the CPU utilization I found only 1 CPU out of 4 utilizes 100%. How to optimize the CPU usage to distribute the load in a large log data environment?

Hello! I have some counter questions:

  • For what period of time is your 250 MB log file?
  • What are the time periods for your queries?
  • Are there any filters applied for such long queries?
  • What is your mail server average frequency of messages?
  • Do you use an SSD or HDD at your host with MLP?

For example, I've recently tested db queries on the server with ~120 message per hour for the last 90 days. It took about 30 seconds to query all messages for this period without any additional filters (about 250000 rows in output). With full text search (log lines) it took from 5 to 30 seconds depends on the search pattern (about 500 - 15000 rows in output).

@Thusithaaw
Copy link
Author

Hi

Please find the answers for your questions.

  1. For what period of time is your 250 MB log file? **_The log file is for 2 days' mails log._**
    
  2. What are the time periods for your queries? **_Searching query is for 1 day_**
    
  3. Are there any filters applied for such long queries? **_No_**
    
  4. What is your mail server average frequency of messages? **_Average 4000 per hour_**
    
  5. Do you use an SSD or HDD at your host with MLP? **_Using vSAN with 10k HDD_**
    

Thanks

@drlight17
Copy link
Owner

Thank you for your answers. As I can see your mail server is quite highly loaded. I've never have a chance to test MLP in such conditions.
I think the only thing you can do for now is to try SSD (something like intel optane or another enterprise grade) instead of HDD for the rethinkdb container. Maybe this will reduce query time.
Amount of RAM is only affects the query output array size (in my case 6 GB of RAM is enough for the output of about 300000 rows).
CPU cores are not fully utilized with custom datetime processing during parsing and importing in the db (I've mentioned this in example.env file for MAIL_LOG_TIMESTAMP_CONVERT variable). But it has no effect on the gui working while query is run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants