Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading JSON file kills cloc when counting repo #830

Closed
includesec-erik opened this issue May 25, 2024 · 3 comments
Closed

Reading JSON file kills cloc when counting repo #830

includesec-erik opened this issue May 25, 2024 · 3 comments

Comments

@includesec-erik
Copy link

Describe the bug
When I cloc this repo cloc dies with "Killed".

https://github.com/cisagov/dotgov-data/

It dies when reading this file:
https://github.com/cisagov/dotgov-data/blob/main/dotgov-websites/pulse-subdomains-snapshot-06-08-2020-https.json

$ wc dotgov-data/dotgov-websites/pulse-subdomains-snapshot-06-08-2020-https.json
0    75980 13071784 dotgov-data/dotgov-websites/pulse-subdomains-snapshot-06-08-2020-https.json

cloc; OS; OS version

  • cloc version: 2.0
  • If running the cloc source, Perl version: v5.34.0
  • OS (eg Linux, Windows, macOS, etc): Ubuntu
  • OS version: Ubuntu 22.04.4 LTS

To Reproduce

  1. Download this file locally https://github.com/cisagov/dotgov-data/blob/main/dotgov-websites/pulse-subdomains-snapshot-06-08-2020-https.json
  2. Run cloc on the file
  3. See this output:
[~/foss/cisa/dotgov-data]
$ cloc .
      23 text files.
      21 unique files.
Killed

Expected result
Cloc continuing to count and not dying entirely when one file causes issues during counting.

Additional context
I tried adjusting the timeout from 1sec to 10sec, didn't fix the issue.
#372

@AlDanial
Copy link
Owner

I'm unable to duplicate this issue on Ubuntu 24.04 LTS. I cloned the repo then

dotgov-data » cloc .
      23 text files.
      21 unique files.                              
       3 files ignored.

github.com/AlDanial/cloc v 2.01  T=0.76 s (27.6 files/s, 444008.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
CSV                             10              0              0         305072
Text                             1              0              0          32222
Markdown                         5             39              0             56
YAML                             3              2              1             56
Bourne Shell                     1              0              0              6
JSON                             1              0              0              1
-------------------------------------------------------------------------------
SUM:                            21             41              1         337413
-------------------------------------------------------------------------------

dotgov-data » wc dotgov-websites/pulse-subdomains-snapshot-06-08-2020-https.json       
       0    75980 13071784 dotgov-websites/pulse-subdomains-snapshot-06-08-2020-https.json

dotgov-data » cloc dotgov-websites/pulse-subdomains-snapshot-06-08-2020-https.json
       1 text file.
       1 unique file.                              
       0 files ignored.

github.com/AlDanial/cloc v 2.01  T=0.55 s (1.8 files/s, 1.8 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
JSON                             1              0              0              1
-------------------------------------------------------------------------------

Possibly a memory issue? My machine has 64 GB.

@includesec-erik
Copy link
Author

@AlDanial I'm sorry for wasting your time this weekend, it is exactly as you describe, this was an ephemeral instance of Ubuntu from our internal pentest VM cluster that only had 1GB RAM.

Confirmed via:

root@ip-10-0-2-41:~# dmesg -T | egrep -i 'killed process'
[Sat May 25 08:52:20 2024] Out of memory: Killed process 34447 (perl) total-vm:719368kB, anon-rss:622152kB, file-rss:2304kB, shmem-rss:0kB, UID:1000 pgtables:1348kB oom_score_adj:0
[Sat May 25 08:54:40 2024] Out of memory: Killed process 34450 (perl) total-vm:718184kB, anon-rss:620808kB, file-rss:2176kB, shmem-rss:0kB, UID:1000 pgtables:1332kB oom_score_adj:0

If there is any opportunity to improve the UX with better messaging around OOM Killed processes that'd be great, otherwise closing this out as a non-issue.

@AlDanial
Copy link
Owner

Perl's built-in exception handling is kind of lame so even if I knew where the memory fault happened, it isn't clear I'd be able to do much about it. If you rerun on the VM with -v 3 you might be able to see which subroutine the code was running when it was killed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants