Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JS Heap OOM Exception on video download/remux #444

Closed
0x1f415 opened this issue May 29, 2023 · 19 comments
Closed

JS Heap OOM Exception on video download/remux #444

0x1f415 opened this issue May 29, 2023 · 19 comments

Comments

@0x1f415
Copy link

0x1f415 commented May 29, 2023

service was down for a stream so I missed a stream and I attempted to download using the video download function, but the server consistently crashes every single time when the remux process starts:

2023-05-29 12:48:44.934 | vod.downloadVideo <INFO> Download VOD 1830224547
2023-05-29 12:48:45.052 | vod.downloadVideo <INFO> Starting remux of S202305E0 2023-05-26 [3606b7de-7d2e-4631-8975-3bff756cbb70].mp4
2023-05-29 12:48:45.055 | helper <INFO> Remuxing /usr/local/share/twitchautomator/data/cache/1830224547.ts to /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305/S202305E0 2023-05-26 [3606b7de-7d2e-4631-8975-3bff756cbb70].mp4
startJob process /usr/bin/ffmpeg [
  '/usr/bin/ffmpeg',
  '-i',
  '/usr/local/share/twitchautomator/data/cache/1830224547.ts',
  '-i',
  '/usr/local/share/twitchautomator/data/cache/1830224547.ffmpeg.txt',
  '-map_metadata',
  '1',
  '-c',
  'copy',
  '-bsf:a',
  'aac_adtstoasc',
  '-movflags',
  'faststart',
  '/usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305/S202305E0 2023-05-26 [3606b7de-7d2e-4631-8975-3bff756cbb70].mp4'
]
2023-05-29 12:48:45.071 | helper <INFO> Executing /usr/bin/ffmpeg -i /usr/local/share/twitchautomator/data/cache/1830224547.ts -i /usr/local/share/twitchautomator/data/cache/1830224547.ffmpeg.txt -map_metadata 1 -c copy -bsf:a aac_adtstoasc -movflags faststart /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305/S202305E0 2023-05-26 [3606b7de-7d2e-4631-8975-3bff756cbb70].mp4
2023-05-29 12:48:45.072 | helper <SUCCESS> Spawned process 51 for remux_1830224547.ts
2023-05-29 12:48:45.074 | job <INFO> Save job remux_1830224547.ts with PID 51 to /usr/local/share/twitchautomator/data/cache/pids/remux_1830224547.ts.json
Broadcasting data to 1 clients: {"action":"job_save","data":{"job_name":"remux_1830224547.ts","j...
2023-05-29 12:48:45.075 | helper <INFO> Attached to all streams for process 51 for remux_1830224547.ts
Remux total duration for 1830224547.ts: 12822
[2023-05-29T16:48:46.572Z] Remuxing 1830224547.ts - 343/12822 seconds (3%)
Broadcasting data to 1 clients: {"action":"job_update","data":{"job_name":"remux_1830224547.ts",...
Broadcasting data to 1 clients: {"action":"job_progress","data":{"job_name":"remux_1830224547.ts...
[2023-05-29T16:48:52.351Z] Remuxing 1830224547.ts - 700/12822 seconds (5%)

<--- Last few GCs --->

[33:0x5614180]   125111 ms: Mark-sweep 8069.6 (8227.0) -> 8061.8 (8235.7) MB, 2041.6 / 0.0 ms  (average mu = 0.729, current mu = 0.203) allocation failure; scavenge might not succeed
[33:0x5614180]   128653 ms: Mark-sweep 8077.7 (8235.7) -> 8068.7 (8242.2) MB, 3512.5 / 0.0 ms  (average mu = 0.460, current mu = 0.008) allocation failure; scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb7a940 node::Abort() [/usr/local/bin/node]
 2: 0xa8e823  [/usr/local/bin/node]
 3: 0xd5c940 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 4: 0xd5cce7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 5: 0xf3a3e5  [/usr/local/bin/node]
 6: 0xf4c8cd v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
 7: 0xf26fce v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
 8: 0xf28397 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
 9: 0xf0956a v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/usr/local/bin/node]
10: 0x12ce7af v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/usr/local/bin/node]
11: 0x16fb6b9  [/usr/local/bin/node]

also I tried raising the node heap size with NODE_OPTIONS=--max-old-space-size=8192 and that didn't help.

@0x1f415
Copy link
Author

0x1f415 commented May 29, 2023

log capture from a different attempt, after manually removing a bunch of cached log files, mediainfo jsons, mp4s of failed transcodes - managed to get all the way through the transcode but crashed after mediainfo

2023-05-29 12:59:23.785 | vod.downloadVideo <INFO> Downloaded VOD 1830224547...}
2023-05-29 12:59:23.785 | vod.downloadVideo <INFO> Starting remux of S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4
2023-05-29 12:59:23.787 | helper <INFO> Remuxing /usr/local/share/twitchautomator/data/cache/1830224547.ts to /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305/S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4
startJob process /usr/bin/ffmpeg [
  '/usr/bin/ffmpeg',
  '-i',
  '/usr/local/share/twitchautomator/data/cache/1830224547.ts',
  '-i',
  '/usr/local/share/twitchautomator/data/cache/1830224547.ffmpeg.txt',
  '-map_metadata',
  '1',
  '-c',
  'copy',
  '-bsf:a',
  'aac_adtstoasc',
  '-movflags',
  'faststart',
  '/usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305/S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4'
]
2023-05-29 12:59:23.795 | helper <INFO> Executing /usr/bin/ffmpeg -i /usr/local/share/twitchautomator/data/cache/1830224547.ts -i /usr/local/share/twitchautomator/data/cache/1830224547.ffmpeg.txt -map_metadata 1 -c copy -bsf:a aac_adtstoasc -movflags faststart /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305/S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4
2023-05-29 12:59:23.795 | helper <SUCCESS> Spawned process 68 for remux_1830224547.ts
2023-05-29 12:59:23.796 | job <INFO> Save job remux_1830224547.ts with PID 68 to /usr/local/share/twitchautomator/data/cache/pids/remux_1830224547.ts.json
Broadcasting data to 1 clients: {"action":"job_save","data":{"job_name":"remux_1830224547.ts","j...
2023-05-29 12:59:23.797 | helper <INFO> Attached to all streams for process 68 for remux_1830224547.ts
Remux total duration for 1830224547.ts: 12822
Broadcasting data to 1 clients: {"action":"job_clear","data":{"job_name":"download_vod_183022454...
Broadcasting data to 1 clients: {"action":"job_update","data":{"job_name":"remux_1830224547.ts",...
[2023-05-29T16:59:27.015Z] Remuxing 1830224547.ts - 385/12822 seconds (3%)
Broadcasting data to 1 clients: {"action":"job_clear","data":{"job_name":"download_vod_183022454...
Broadcasting data to 1 clients: {"action":"job_progress","data":{"job_name":"remux_1830224547.ts...
2023-05-29 12:59:41.797 | job <INFO> Job remux_1830224547.ts progress: 15%
[2023-05-29T16:59:41.798Z] Remuxing 1830224547.ts - 1876/12822 seconds (15%)
Broadcasting data to 1 clients: {"action":"job_progress","data":{"job_name":"remux_1830224547.ts...
2023-05-29 12:59:50.573 | job <INFO> Job remux_1830224547.ts progress: 98%
[2023-05-29T16:59:50.574Z] Remuxing 1830224547.ts - 12542/12822 seconds (98%)
Create MOOV atom for 1830224547.ts
2023-05-29 12:59:53.289 | helper.mediainfo <INFO> Run mediainfo on /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4
2023-05-29 12:59:53.290 | helper.videometadata <ERROR> Trying to get mediainfo of /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4 returned: File not found for mediainfo
2023-05-29 12:59:53.290 | channel <ERROR> Trying to get mediainfo of /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4 returned: File not found for mediainfo
::ffff:192.168.1.80 - - [29/May/2023:17:00:11 +0000] "GET /api/v0/channels/080d7c0f-1e34-48fc-8a17-5fa86b8a1dd8/download/1830224547 HTTP/1.1" - - "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/113.0"
Broadcasting data to 1 clients: {"action":"job_progress","data":{"job_name":"remux_1830224547.ts...
[2023-05-29T17:00:11.148Z] Remuxing 1830224547.ts - 12822/12822 seconds (100%)
2023-05-29 13:00:39.908 | helper <SUCCESS> Process 68 for remux_1830224547.ts closed with code 0
2023-05-29 13:00:39.909 | job <INFO> Clear job remux_1830224547.ts with PID 68
Broadcasting data to 1 clients: {"action":"job_clear","data":{"job_name":"remux_1830224547.ts","...
2023-05-29 13:00:39.910 | job <SUCCESS> Job remux_1830224547.ts removed from jobs list
2023-05-29 13:00:39.910 | job <WARNING> Job remux_1830224547.ts not found in jobs list
2023-05-29 13:00:40.253 | helper <SUCCESS> Remuxed /usr/local/share/twitchautomator/data/cache/1830224547.ts to /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305/S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4
2023-05-29 13:00:40.573 | vod.downloadVideo <INFO> Successfully remuxed S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4, removing /usr/local/share/twitchautomator/data/cache/1830224547.ts
2023-05-29 13:00:41.698 | vod.downloadVideo <INFO> Download of S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4 successful
Broadcasting data to 1 clients: {"action":"video_download","data":{"success":true,"path":"/usr/l...
2023-05-29 13:00:41.699 | channel <INFO> Create VOD JSON for paymoneywubby: S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].json @ /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305
2023-05-29 13:00:41.700 | vod.saveJSON <SUCCESS> Saving JSON of S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353]  (create json)
2023-05-29 13:00:41.705 | vod <ERROR> No chapters on S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353]!
2023-05-29 13:00:41.710 | vod.load <WARNING> Loaded VOD S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353] is not finalized!
2023-05-29 13:00:41.841 | vod.saveJSON <WARNING> Saving JSON of S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353] with no chapters!!
2023-05-29 13:00:41.841 | vod.saveJSON <SUCCESS> Saving JSON of S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353]  (manual creation)
2023-05-29 13:00:42.139 | vod.addSegment <INFO> Adding segment S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4 to S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353]
2023-05-29 13:00:42.140 | vod.finalize <INFO> Finalize S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353] @ /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305
2023-05-29 13:00:42.141 | vod <INFO> Fetching mediainfo of S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353], segment #0
2023-05-29 13:00:42.142 | helper.mediainfo <INFO> Run mediainfo on /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305/S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4
2023-05-29 13:00:42.483 | helper.execSimple <EXEC> Executing 'mediainfo': $ /usr/bin/mediainfo --Full --Output=JSON /usr/local/share/twitchautomator/data/storage/vods/paymoneywubby/Season 202305/S202305E0 2023-05-26 [7968e04a-22ca-4ada-b234-a0de9db14353].mp4
Broadcasting data to 1 clients: {"action":"job_progress","data":{"job_name":"remux_1830224547.ts...
Broadcasting data to 1 clients: {"action":"job_clear","data":{"job_name":"remux_1830224547.ts","...

<--- Last few GCs --->

[35:0x5614180]   310041 ms: Mark-sweep 7949.8 (8232.6) -> 7940.7 (8238.6) MB, 3852.7 / 0.0 ms  (average mu = 0.394, current mu = 0.007) allocation failure; scavenge might not succeed
[35:0x5614180]   314122 ms: Mark-sweep 7956.6 (8238.6) -> 7947.6 (8245.8) MB, 4048.5 / 0.0 ms  (average mu = 0.236, current mu = 0.008) allocation failure; scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb7a940 node::Abort() [/usr/local/bin/node]
 2: 0xa8e823  [/usr/local/bin/node]
 3: 0xd5c940 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 4: 0xd5cce7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 5: 0xf3a3e5  [/usr/local/bin/node]
 6: 0xf4c8cd v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
 7: 0xf26fce v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
 8: 0xf28397 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
 9: 0xf0956a v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/usr/local/bin/node]
10: 0x12ce7af v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/usr/local/bin/node]
11: 0x16fb6b9  [/usr/local/bin/node]

@MrBrax
Copy link
Owner

MrBrax commented May 29, 2023

oh no, this is probably the worst thing one can encounter... really hoping this is just a node version error

@0x1f415
Copy link
Author

0x1f415 commented Jun 11, 2023

I appear to be having this issue with normal stream capture as well and not just the vod download.

it seems to be something related to the remux specifically that is causing the problem. I can open the recorded ts file in cache/capture and it recorded the entire stream, but the remux process appeared to create a corrupt MP4 file in the vods folder and crash the entire server after the stream ends.

attached are the logs with debug mode on, stream ended at 00:17

2023-06-10.log
remux_S202306E07 2023-06-09 [42301348827].ts_stderr.log
remux_S202306E07 2023-06-09 [42301348827].ts_stdout.log

@0x1f415
Copy link
Author

0x1f415 commented Jun 11, 2023

interestingly, running the ffmpeg command in a docker exec session crashes the LiveStreamDVR server process without my having interacted directly with the application at all, leads me to believe the culprit is something listening to file change events as the remuxed vod mp4 file is being written to

stopping the server.js process with a SIGSTOP (to stop the container from immediately quitting when the process dies), running the ffmpeg command while it was stopped rendered the file out successfully, then when server.js was reawakened with SIGCONT it OOM'd.

@MrBrax
Copy link
Owner

MrBrax commented Jun 12, 2023

that could definitely make sense, i should make an option to disable file watching

MrBrax added a commit that referenced this issue Jun 13, 2023
@MrBrax
Copy link
Owner

MrBrax commented Jun 13, 2023

could you try the dev version with debug enabled and check how the memory rises? i also added a config option to disable the vod watching, maybe that helps

@0x1f415
Copy link
Author

0x1f415 commented Jun 13, 2023

on the develop tag docker image:

test 1: disabled file watcher, downloaded a moistcr1tikal vod (previously had none of his vods saved), no crash, server.js memory stayed consistently below 200 megs
test 2: reenabled file watcher, downloaded a second moistcr1tikal vod, same as above
test 3: file watcher still enabled, downloaded a paymoneywubby vod, memory usage grew to ~5.5GB and crashed with an OOM exception

seems apparent that this issue is dependent on the streamer, and indicates that the file watcher implementation doesn't scale to large vod collections:

image

now that I can see from your commit the specific function I may be able to take a shot at fixing it when I get the time, but leaving the watcher disabled works for now

@MrBrax
Copy link
Owner

MrBrax commented Jun 14, 2023

does the new log line show the memory grow with each file watch event? i can find some memory issues searching for chokidar so maybe that's just what it does

@0x1f415
Copy link
Author

0x1f415 commented Jun 16, 2023

I haven't seen any information about memory usage in the logs actually, I've been monitoring resource usage through htop.

@MrBrax
Copy link
Owner

MrBrax commented Jun 16, 2023

in the docker console when in debug mode, does it tell you about memory usage when file changes have been detected?

@0x1f415
Copy link
Author

0x1f415 commented Jun 17, 2023

no, I just checked my logs, made sure watcher was enabled an server was in debug mode, tried downloading a vod, but strangely I don't see any lines about memory usage at all

@0x1f415
Copy link
Author

0x1f415 commented Jun 17, 2023

also it seems that the file watcher disabler feature in 17ecfb8 fixes the issue for downloading a vod, but not for finishing a recorded stream

@MrBrax
Copy link
Owner

MrBrax commented Jun 17, 2023

that is quite odd considering it's disabled for both occasions, unless you didn't restart after changing it? maybe it's something other than the file watchers...

@0x1f415
Copy link
Author

0x1f415 commented Jun 24, 2023

disabling the watcher might not have taken the first time because I wasn't aware that that option required a restart to apply, but it has been disabled for about a week and the past 2 streams have still caused a crash - although interestingly these past two streams managed to finish remuxing before the crash, might be because they were shorter and finished remuxing before the server OOM'd, unsure. these past two crashes I wasn't actively watching the docker log output to confirm that the issue was the same OOM exception (the OOM exception isn't written to the docker logs for some reason) but I am pretty sure they were

@MrBrax
Copy link
Owner

MrBrax commented Jun 24, 2023

is this after downloading multiple vods in the same "session"? i am noticing raised memory usage in docker on windows wsl

@MrBrax
Copy link
Owner

MrBrax commented Jun 24, 2023

i tried to do some memory profiling on my windows machine but honestly i just don't have the experience to be able to get any useful information out of it, things that are as low level as this i really don't know where to start

@MrBrax
Copy link
Owner

MrBrax commented Jul 18, 2023

is this still happening? i wonder if it has to do with the amount of log entries or keyvalues in memory

@0x1f415
Copy link
Author

0x1f415 commented Jul 18, 2023

yes it's still happening to me, not on every stream but usually on the longer ones and only when a stream ends. I want to see if I can diagnose and potentially fix the issue myself, but haven't found the time yet

MrBrax added a commit that referenced this issue Nov 27, 2023
@0x1f415
Copy link
Author

0x1f415 commented Feb 5, 2024

since my last comment, I've taken a couple stabs at sitting down and getting to the root of this issue, which involved setting up an isolated development VM, copying my entire vod database, etc-

today I sat down for another attempt, realized that my configuration had broken since subsequent upstream updates, and decided that this isn't worth the effort, for me anyway.

the root of the issue is that I had over 100 vods totaling almost 500 gigabytes, which don't all need to be under the application's purview. I'm going to wipe everything and setup the server again from a clean slate, this time configured to use the file export functionality, and I recommend users with similar issues do the same.

@0x1f415 0x1f415 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants