Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spool to disk error #12174

Closed
sentient opened this issue May 10, 2019 · 5 comments
Closed

Spool to disk error #12174

sentient opened this issue May 10, 2019 · 5 comments
Assignees
Labels
libbeat Team:Integrations Label for the Integrations team

Comments

@sentient
Copy link
Contributor

I'm getting errors when I use the

queue:
   spool:
      file:

feature

not enough memory to allocate 255 data page

2019-05-10T18:28:10.029Z ERROR [publisher] spool/inbroker.go:544 Spool flush failed with: pq/writer-flush: txfile/tx-alloc-pages: file='/var/lib/statsdbeat/spool.dat' tx=0: transaction failed during commit: not enough memory to allocate 255 data page(s)

This is running on a server with 4Gb of memory. What configuration settings are there to reduce the number of data pages.

Thanks

@faec
Copy link
Contributor

faec commented Jan 28, 2020

I'm still investigating this, but some preliminary details: the error is misleading, as the real condition isn't a lack of system memory but a lack of free pages in the disk spool. I've tried many different constraints on both current and old beats versions and e.g. a 4GB spool on a machine with 2GB memory works fine.

The problem observed here is that a call to Writer.Flush() needs 255 pages (~1MB on disk) but there aren't that many free. This could mean an internal inconsistency that leads to fewer free blocks than there should be, or bad / misleading error handling that makes this a (seemingly?) fatal error instead of just retrying or blocking until the space frees up.

The default spool file size is only 100MB, so the (imperfect) configuration workaround would actually be to increase the spool size by adding e.g. size: 1G in the file configuration. (This wouldn't address the underlying bug, if there is one, but it will give a larger buffer so intermittent link saturation doesn't reach the limit so easily).

@christophercutajar
Copy link

I've noticed the same error with filebeat 7.5.1

2020-09-14T15:29:27.478Z        ERROR   [publisher]     spool/inbroker.go:547   Spool flush failed with: pq/writer-flush: txfile/tx-alloc-pages: file='/var/lib/filebeat/spool.dat' tx=0: transaction failed during commit: not enough memory to allocate 54 data page(s)
2020-09-14T15:29:28.777Z        ERROR   [publisher]     spool/inbroker.go:547   Spool flush failed with: pq/writer-flush: txfile/tx-alloc-pages: file='/var/lib/filebeat/spool.dat' tx=0: transaction failed during commit: not enough memory to allocate 55 data page(s)

Having the below config:

# Settings to cache events to disk in case logstash is unreachable
queue.spool:
  file:
    size: 512MiB
    page_size: 16KiB
  write:
    buffer_size: 10MiB
    flush.timeout: 5s
    flush.events: 1024

Not sure what caused that ERROR but after sometime I can see data being ingested.

@rsdrakh
Copy link

rsdrakh commented Jan 27, 2021

Same here for journalbeat Version 7.6.1.
Config:

queue:
    spool:
        file:
            page_size: 16KiB
            path: /srv/beats/journalbeat/spool.dat
            prealloc: true
            size: 4GiB
        read:
            flush:
                timeout: 0s
        write:
            buffer_size: 5MiB
            flush:
                events: 2048
                timeout: 10s

I notice that a cron'ed systemctl restart journalbeat does not help the situation, after issueing systemctl stop journalbeat followed by systemctl start journalbeat the contents of the spool file seem to be ingested correctly. The cronjob restarts journalbeat every 10 minutes due to problems with corrupt journal files (there is an open issue for that).

The logged ERROR appears sporadically during the day, somehow resolving itself (?), then manifests at some point so ingest comes to a complete halt.

EDIT: output is logstash, loadbalanced to two nodes, no further config.

@Aloshi
Copy link

Aloshi commented Jul 14, 2021

Seeing this as well with filebeat 7.7.1 and a 200MiB spool file, from the stack monitoring page this log message seems to correlate with our beat no longer queuing new events. From /var/log/filebeat/filebeat:

...
2021-07-13T21:00:28.829Z        INFO    log/harvester.go:297    Harvester started for file: xxx.log
2021-07-13T21:00:42.492Z        ERROR   [publisher]     spool/inbroker.go:547   Spool flush failed with: pq/writer-flush: txfile/tx-alloc-pages: file='/var/lib/filebeat/spool.dat' tx=0: transaction failed during commit: not enough memory to allocate 44 data page(s)

image

(times in log are UTC, screenshot is PST)

@faec
Copy link
Contributor

faec commented Oct 6, 2021

Closing: disk spool is deprecated in favor of the disk queue

@faec faec closed this as completed Oct 6, 2021
@zube zube bot removed the [zube]: Done label Jan 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libbeat Team:Integrations Label for the Integrations team
Projects
None yet
Development

No branches or pull requests

9 participants