Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistently error when allocation request fails #6271

Merged
merged 16 commits into from
Jul 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .ci/.lintr.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
dt_linters = new.env()
for (f in list.files('.ci/linters', full.names=TRUE)) sys.source(f, dt_linters)
for (f in list.files('.ci/linters/r', full.names=TRUE)) sys.source(f, dt_linters)
rm(f)

# NB: Could do this inside the linter definition, this separation makes those files more standardized
Expand Down
36 changes: 36 additions & 0 deletions .ci/linters/c/alloc_linter.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Ensure that we check the result of malloc()/calloc() for success
# More specifically, given that this is an AST-ignorant checker,
# 1. Find groups of malloc()/calloc() calls
# 2. Check the next line for a check like 'if (!x || !y)'
alloc_linter = function(c_file) {
lines <- readLines(c_file)
# Be a bit more precise to avoid mentions in comments
alloc_lines <- grep(R"{=\s*([(]\w+\s*[*][)])?[mc]alloc[(]}", lines)
if (!length(alloc_lines)) return()
# int *tmp=(int*)malloc(...); or just int tmp=malloc(...);
alloc_keys <- lines[alloc_lines] |>
strsplit(R"(\s*=\s*)") |>
vapply(head, 1L, FUN.VALUE="") |>
trimws() |>
# just normalize the more exotic assignments, namely 'type *restrict key = ...'
gsub(pattern = "[*]\\s*(restrict\\s*)?", replacement = "*") |>
strsplit("*", fixed=TRUE) |>
vapply(tail, 1L, FUN.VALUE="")
alloc_grp_id <- cumsum(c(TRUE, diff(alloc_lines) != 1L))

# execute by group
tapply(seq_along(alloc_lines), alloc_grp_id, function(grp_idx) {
keys_regex <- paste0("\\s*!\\s*", alloc_keys[grp_idx], "\\s*", collapse = "[|][|]")
check_regex <- paste0("if\\s*\\(", keys_regex)
check_line <- lines[alloc_lines[tail(grp_idx, 1L)] + 1L]
# Rarely (once in fread.c as of initialization), error checking is handled
# but not immediately, so use 'NOCHECK' to escape.
if (!grepl(check_regex, check_line) && !grepl("NOCHECK", check_line, fixed=TRUE)) {
bad_lines_idx <- seq(alloc_lines[grp_idx[1L]], length.out=length(grp_idx)+1L)
cat("FILE: ", c_file, "; LINES: ", head(bad_lines_idx, 1L), "-", tail(bad_lines_idx, 1L), "\n", sep="")
writeLines(lines[bad_lines_idx])
cat(strrep("-", max(nchar(lines[bad_lines_idx]))), "\n", sep="")
stop("Expected the malloc()/calloc() usage above to be followed immediately by error checking.", call.=FALSE)
}
})
}
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ on:
branches:
- master

name: lint
name: code-quality

jobs:
lint:
lint-r:
runs-on: ubuntu-latest
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
Expand All @@ -33,3 +33,16 @@ jobs:
env:
LINTR_ERROR_ON_LINT: true
R_LINTR_LINTER_FILE: .ci/.lintr
lint-c:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: r-lib/actions/setup-r@v2
- name: Lint
run: |
for (f in list.files('.ci/linters/c', full.names=TRUE)) source(f)
for (f in list.files('src', pattern='[.]c$', full.names=TRUE)) {
alloc_linter(f)
# TODO(#6272): Incorporate more checks from CRAN_Release
}
shell: Rscript {0}
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,8 @@

21. Refactored some non-API calls to R macros for S4 objects (#6180)[https://github.com/Rdatatable/data.table/issues/6180]. There should be no user-visible change. Thanks to various R users & R core for pushing to have a clearer definition of "API" for R, and thanks @MichaelChirico for implementing here.

22. C code was unified more in how failures to allocate memory (`malloc()`/`calloc()`) are handled, (#1115)[https://github.com/Rdatatable/data.table/issues/1115]. No OOM issues were reported, as these regions of code typically request relatively small blocks of memory, but it is good to handle memory pressure consistently. Thanks @elfring for the report and @MichaelChirico for the clean-up effort and future-proofing linter.

## TRANSLATIONS

1. Fix a typo in a Mandarin translation of an error message that was hiding the actual error message, [#6172](https://github.com/Rdatatable/data.table/issues/6172). Thanks @trafficfan for the report and @MichaelChirico for the fix.
Expand Down
3 changes: 2 additions & 1 deletion src/assign.c
Original file line number Diff line number Diff line change
Expand Up @@ -1247,7 +1247,8 @@ void savetl_init(void) {
nalloc = 100;
saveds = (SEXP *)malloc(nalloc * sizeof(SEXP));
savedtl = (R_len_t *)malloc(nalloc * sizeof(R_len_t));
if (saveds==NULL || savedtl==NULL) {
if (!saveds || !savedtl) {
free(saveds); free(savedtl);
savetl_end(); // # nocov
error(_("Failed to allocate initial %d items in savetl_init"), nalloc); // # nocov
}
Expand Down
2 changes: 0 additions & 2 deletions src/bmerge.c
Original file line number Diff line number Diff line change
Expand Up @@ -129,8 +129,6 @@ SEXP bmerge(SEXP idt, SEXP xdt, SEXP icolsArg, SEXP xcolsArg, SEXP isorted, SEXP
retFirst = Calloc(anslen, int); // anslen is set above
retLength = Calloc(anslen, int);
retIndex = Calloc(anslen, int);
if (retFirst==NULL || retLength==NULL || retIndex==NULL)
error(_("Internal error in allocating memory for non-equi join")); // # nocov
// initialise retIndex here directly, as next loop is meant for both equi and non-equi joins
for (int j=0; j<anslen; j++) retIndex[j] = j+1;
} else { // equi joins (or) non-equi join but no multiple matches
Expand Down
3 changes: 2 additions & 1 deletion src/chmatch.c
Original file line number Diff line number Diff line change
Expand Up @@ -94,10 +94,11 @@ static SEXP chmatchMain(SEXP x, SEXP table, int nomatch, bool chin, bool chmatch
// uniq dups
// For example: A,B,C,B,D,E,A,A => A(TL=1),B(2),C(3),D(4),E(5) => dupMap 1 2 3 5 6 | 8 7 4
// dupLink 7 8 | 6 (blank=0)
int *counts = (int *)calloc(nuniq, sizeof(int));
unsigned int mapsize = tablelen+nuniq; // lto compilation warning #5760 // +nuniq to store a 0 at the end of each group
int *counts = (int *)calloc(nuniq, sizeof(int));
int *map = (int *)calloc(mapsize, sizeof(int));
if (!counts || !map) {
free(counts); free(map);
// # nocov start
for (int i=0; i<tablelen; i++) SET_TRUELENGTH(td[i], 0);
savetl_end();
Expand Down
36 changes: 28 additions & 8 deletions src/forder.c
Original file line number Diff line number Diff line change
Expand Up @@ -267,9 +267,11 @@ static void cradix_r(SEXP *xsub, int n, int radix)
static void cradix(SEXP *x, int n)
{
cradix_counts = (int *)calloc(ustr_maxlen*256, sizeof(int)); // counts for the letters of left-aligned strings
if (!cradix_counts) STOP(_("Failed to alloc cradix_counts"));
cradix_xtmp = (SEXP *)malloc(ustr_n*sizeof(SEXP));
if (!cradix_xtmp) STOP(_("Failed to alloc cradix_tmp"));
if (!cradix_counts || !cradix_xtmp) {
free(cradix_counts); free(cradix_xtmp);
STOP(_("Failed to alloc cradix_counts and/or cradix_tmp"));
}
cradix_r(x, n, 0);
free(cradix_counts); cradix_counts=NULL;
free(cradix_xtmp); cradix_xtmp=NULL;
Expand Down Expand Up @@ -319,7 +321,8 @@ static void range_str(SEXP *x, int n, uint64_t *out_min, uint64_t *out_max, int
SEXP ustr2 = PROTECT(allocVector(STRSXP, ustr_n));
for (int i=0; i<ustr_n; i++) SET_STRING_ELT(ustr2, i, ENC2UTF8(ustr[i]));
SEXP *ustr3 = (SEXP *)malloc(ustr_n * sizeof(SEXP));
if (!ustr3) STOP(_("Failed to alloc ustr3 when converting strings to UTF8")); // # nocov
if (!ustr3)
STOP(_("Failed to alloc ustr3 when converting strings to UTF8")); // # nocov
memcpy(ustr3, STRING_PTR(ustr2), ustr_n*sizeof(SEXP));
// need to reset ustr_maxlen because we need ustr_maxlen for utf8 strings
ustr_maxlen = 0;
Expand All @@ -337,7 +340,8 @@ static void range_str(SEXP *x, int n, uint64_t *out_min, uint64_t *out_max, int
}
// now use the 1-1 mapping from ustr to ustr2 to get the ordering back into original ustr, being careful to reset tl to 0
int *tl = (int *)malloc(ustr_n * sizeof(int));
if (!tl) STOP(_("Failed to alloc tl when converting strings to UTF8")); // # nocov
if (!tl)
STOP(_("Failed to alloc tl when converting strings to UTF8")); // # nocov
SEXP *tt = STRING_PTR(ustr2);
for (int i=0; i<ustr_n; i++) tl[i] = TRUELENGTH(tt[i]); // fetches the o in ustr3 into tl which is ordered by ustr
for (int i=0; i<ustr_n; i++) SET_TRUELENGTH(ustr3[i], 0); // reset to 0 tl of the UTF8 (and possibly non-UTF in ustr too)
Expand Down Expand Up @@ -719,13 +723,19 @@ SEXP forder(SEXP DT, SEXP by, SEXP retGrpArg, SEXP sortGroupsArg, SEXP ascArg, S
nth = getDTthreads(nrow, true); // this nth is relied on in cleanup(); throttle=true/false debated for #5077
TMP = (int *)malloc(nth*UINT16_MAX*sizeof(int)); // used by counting sort (my_n<=65536) in radix_r()
UGRP = (uint8_t *)malloc(nth*256); // TODO: align TMP and UGRP to cache lines (and do the same for stack allocations too)
if (!TMP || !UGRP /*|| TMP%64 || UGRP%64*/) STOP(_("Failed to allocate TMP or UGRP or they weren't cache line aligned: nth=%d"), nth);
if (!TMP || !UGRP /*|| TMP%64 || UGRP%64*/) {
free(TMP); free(UGRP);
STOP(_("Failed to allocate TMP or UGRP or they weren't cache line aligned: nth=%d"), nth);
}

if (retgrp) {
gs_thread = calloc(nth, sizeof(int *)); // thread private group size buffers
gs_thread_alloc = calloc(nth, sizeof(int));
gs_thread_n = calloc(nth, sizeof(int));
if (!gs_thread || !gs_thread_alloc || !gs_thread_n) STOP(_("Could not allocate (very tiny) group size thread buffers"));
if (!gs_thread || !gs_thread_alloc || !gs_thread_n) {
free(gs_thread); free(gs_thread_alloc); free(gs_thread_n);
STOP(_("Could not allocate (very tiny) group size thread buffers"));
}
}
if (nradix) {
radix_r(0, nrow-1, 0); // top level recursive call: (from, to, radix)
Expand Down Expand Up @@ -1045,7 +1055,10 @@ void radix_r(const int from, const int to, const int radix) {
uint16_t *counts = calloc(nBatch*256,sizeof(uint16_t));
uint8_t *ugrps = malloc(nBatch*256*sizeof(uint8_t));
int *ngrps = calloc(nBatch ,sizeof(int));
if (!counts || !ugrps || !ngrps) STOP(_("Failed to allocate parallel counts. my_n=%d, nBatch=%d"), my_n, nBatch);
if (!counts || !ugrps || !ngrps) {
free(counts); free(ugrps); free(ngrps);
STOP(_("Failed to allocate parallel counts. my_n=%d, nBatch=%d"), my_n, nBatch);
}

bool skip=true;
const int n_rem = nradix-radix-1; // how many radix are remaining after this one
Expand All @@ -1054,6 +1067,10 @@ void radix_r(const int from, const int to, const int radix) {
{
int *my_otmp = malloc(batchSize * sizeof(int)); // thread-private write
uint8_t *my_ktmp = malloc(batchSize * sizeof(uint8_t) * n_rem);
if (!my_otmp || !my_ktmp) {
free(my_otmp); free(my_ktmp);
STOP(_("Failed to allocate 'my_otmp' and/or 'my_ktmp' arrays (%d bytes)."), (int)(batchSize*(sizeof(int) + sizeof(uint8_t))));
}
// TODO: move these up above and point restrict[me] to them. Easier to Error that way if failed to alloc.
#pragma omp for
for (int batch=0; batch<nBatch; batch++) {
Expand Down Expand Up @@ -1139,6 +1156,8 @@ void radix_r(const int from, const int to, const int radix) {
// If skip==true and we're already done, we still need the first row of this cummulate (diff to get total group sizes) to push() or recurse below

int *starts = calloc(nBatch*256, sizeof(int)); // keep starts the same shape and ugrp order as counts
if (!starts)
STOP(_("Failed to allocate %d bytes for '%s'."), (int)(nBatch*256*sizeof(int)), "starts");
for (int j=0, sum=0; j<ngrp; j++) { // iterate through columns (ngrp bytes)
uint16_t *tmp1 = counts+ugrp[j];
int *tmp2 = starts+ugrp[j];
Expand All @@ -1154,7 +1173,8 @@ void radix_r(const int from, const int to, const int radix) {
TEND(18 + notFirst*3)
if (!skip) {
int *TMP = malloc(my_n * sizeof(int));
if (!TMP) STOP(_("Unable to allocate TMP for my_n=%d items in parallel batch counting"), my_n);
if (!TMP)
STOP(_("Unable to allocate TMP for my_n=%d items in parallel batch counting"), my_n);
#pragma omp parallel for num_threads(getDTthreads(nBatch, false))
for (int batch=0; batch<nBatch; batch++) {
const int *restrict my_starts = starts + batch*256;
Expand Down
17 changes: 13 additions & 4 deletions src/fread.c
Original file line number Diff line number Diff line change
Expand Up @@ -447,7 +447,8 @@ void copyFile(size_t fileSize, const char *msg, bool verbose) // only called in
{
double tt = wallclock();
mmp_copy = (char *)malloc((size_t)fileSize + 1/* extra \0 */);
if (!mmp_copy) STOP(_("Unable to allocate %s of contiguous virtual RAM. %s allocation."), filesize_to_str(fileSize), msg);
if (!mmp_copy)
STOP(_("Unable to allocate %s of contiguous virtual RAM. %s allocation."), filesize_to_str(fileSize), msg);
memcpy(mmp_copy, mmp, fileSize);
sof = mmp_copy;
eof = (char *)mmp_copy + fileSize;
Expand Down Expand Up @@ -1828,7 +1829,10 @@ int freadMain(freadMainArgs _args) {

type = (int8_t *)malloc((size_t)ncol * sizeof(int8_t));
tmpType = (int8_t *)malloc((size_t)ncol * sizeof(int8_t)); // used i) in sampling to not stop on errors when bad jump point and ii) when accepting user overrides
if (!type || !tmpType) STOP(_("Failed to allocate 2 x %d bytes for type and tmpType: %s"), ncol, strerror(errno));
if (!type || !tmpType) {
free(type); free(tmpType);
STOP(_("Failed to allocate 2 x %d bytes for type and tmpType: %s"), ncol, strerror(errno));
}

if (sep == ',' && dec == '\0') { // if sep=',' detected, don't attempt to detect dec [NB: . is not par of seps]
if (verbose) {
Expand Down Expand Up @@ -2074,7 +2078,8 @@ int freadMain(freadMainArgs _args) {
colNames = NULL; // userOverride will assign V1, V2, etc
} else {
colNames = (lenOff*) calloc((size_t)ncol, sizeof(lenOff));
if (!colNames) STOP(_("Unable to allocate %d*%d bytes for column name pointers: %s"), ncol, sizeof(lenOff), strerror(errno));
if (!colNames)
STOP(_("Unable to allocate %d*%d bytes for column name pointers: %s"), ncol, sizeof(lenOff), strerror(errno));
if (sep==' ') while (*ch==' ') ch++;
void *targets[9] = {NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, colNames + autoFirstColName};
FieldParseContext fctx = {
Expand Down Expand Up @@ -2128,7 +2133,8 @@ int freadMain(freadMainArgs _args) {
rowSize4 = 0;
rowSize8 = 0;
size = (int8_t *)malloc((size_t)ncol * sizeof(int8_t)); // TODO: remove size[] when we implement Pasha's idea to += size inside processor
if (!size) STOP(_("Failed to allocate %d bytes for size array: %s"), ncol, strerror(errno));
if (!size)
STOP(_("Failed to allocate %d bytes for '%s': %s"), (int)(ncol * sizeof(int8_t)), "size", strerror(errno));
nStringCols = 0;
nNonStringCols = 0;
for (int j=0; j<ncol; j++) {
Expand Down Expand Up @@ -2248,6 +2254,7 @@ int freadMain(freadMainArgs _args) {
.buff8 = malloc(rowSize8 * myBuffRows + 8),
.buff4 = malloc(rowSize4 * myBuffRows + 4),
.buff1 = malloc(rowSize1 * myBuffRows + 1),
// NOCHECK
.rowSize8 = rowSize8,
.rowSize4 = rowSize4,
.rowSize1 = rowSize1,
Expand Down Expand Up @@ -2567,6 +2574,8 @@ int freadMain(freadMainArgs _args) {
DTPRINT(_(" Dropping %d overallocated columns\n"), ndropFill);
}
dropFill = (int *)malloc((size_t)ndropFill * sizeof(int));
if (!dropFill)
STOP(_("Failed to allocate %d bytes for '%s'."), (int)(ndropFill * sizeof(int)), "dropFill");
int i=0;
for (int j=max_col; j<ncol; ++j) {
type[j] = CT_DROP;
Expand Down
4 changes: 2 additions & 2 deletions src/fsort.c
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@ SEXP fsort(SEXP x, SEXP verboseArg) {
// each thread has its own small stack of counts
// don't use VLAs here: perhaps too big for stack yes but more that VLAs apparently fail with schedule(dynamic)
uint64_t *restrict mycounts = calloc((toBit/8 + 1)*256, sizeof(uint64_t));
if (mycounts==NULL) {
if (!mycounts) {
failed=true; alloc_fail=true; // # nocov
}
double *restrict myworking = NULL;
Expand All @@ -283,7 +283,7 @@ SEXP fsort(SEXP x, SEXP verboseArg) {

if (myworking==NULL) {
myworking = malloc(thisN * sizeof(double));
if (myworking==NULL) {
if (!myworking) {
failed=true; alloc_fail=true; continue; // # nocov
}
myfirstmsb = msb;
Expand Down
7 changes: 5 additions & 2 deletions src/fwrite.c
Original file line number Diff line number Diff line change
Expand Up @@ -715,7 +715,8 @@ void fwriteMain(fwriteMainArgs args)
}
if (headerLen) {
char *buff = malloc(headerLen);
if (!buff) STOP(_("Unable to allocate %zu MiB for header: %s"), headerLen / 1024 / 1024, strerror(errno));
if (!buff)
STOP(_("Unable to allocate %zu MiB for header: %s"), headerLen / 1024 / 1024, strerror(errno));
char *ch = buff;
if (args.bom) {*ch++=(char)0xEF; *ch++=(char)0xBB; *ch++=(char)0xBF; } // 3 appears above (search for "bom")
memcpy(ch, args.yaml, yamlLen);
Expand Down Expand Up @@ -830,8 +831,8 @@ void fwriteMain(fwriteMainArgs args)
}
char *zbuffPool = NULL;
if (args.is_gzip) {
zbuffPool = malloc(nth*(size_t)zbuffSize);
#ifndef NOZLIB
zbuffPool = malloc(nth*(size_t)zbuffSize);
if (!zbuffPool) {
// # nocov start
free(buffPool);
Expand Down Expand Up @@ -986,7 +987,9 @@ void fwriteMain(fwriteMainArgs args)
}
}
free(buffPool);
#ifndef NOZLIB
free(zbuffPool);
#endif

// Finished parallel region and can call R API safely now.
if (hasPrinted) {
Expand Down
14 changes: 9 additions & 5 deletions src/gsumm.c
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,10 @@ SEXP gforce(SEXP env, SEXP jsub, SEXP o, SEXP f, SEXP l, SEXP irowsArg) {
//Rprintf(_("When assigning grp[o] = g, highSize=%d nb=%d bitshift=%d nBatch=%d\n"), highSize, nb, bitshift, nBatch);
int *counts = calloc(nBatch*highSize, sizeof(int)); // TODO: cache-line align and make highSize a multiple of 64
int *TMP = malloc(nrow*2l*sizeof(int)); // must multiple the long int otherwise overflow may happen, #4295
if (!counts || !TMP ) error(_("Internal error: Failed to allocate counts or TMP when assigning g in gforce"));
if (!counts || !TMP ) {
free(counts); free(TMP);
error(_("Internal error: Failed to allocate counts or TMP when assigning g in gforce"));
}
#pragma omp parallel for num_threads(getDTthreads(nBatch, false)) // schedule(dynamic,1)
for (int b=0; b<nBatch; b++) {
const int howMany = b==nBatch-1 ? lastBatchSize : batchSize;
Expand Down Expand Up @@ -617,7 +620,8 @@ SEXP gmean(SEXP x, SEXP narmArg)
} else {
// narm==true and anyNA==true
int *restrict nna_counts = calloc(ngrp, sizeof(int));
if (!nna_counts) error(_("Unable to allocate %d * %zu bytes for non-NA counts in gmean na.rm=TRUE"), ngrp, sizeof(int));
if (!nna_counts)
error(_("Unable to allocate %d * %zu bytes for non-NA counts in gmean na.rm=TRUE"), ngrp, sizeof(int));
#pragma omp parallel for num_threads(getDTthreads(highSize, false))
for (int h=0; h<highSize; h++) {
double *restrict _ans = ansp + (h<<bitshift);
Expand Down Expand Up @@ -672,8 +676,7 @@ SEXP gmean(SEXP x, SEXP narmArg)
int *restrict nna_counts_i = calloc(ngrp, sizeof(int));
if (!nna_counts_r || !nna_counts_i) {
// # nocov start
free(nna_counts_r); // free(NULL) is allowed and does nothing. Avoids repeating the error() call here.
free(nna_counts_i);
free(nna_counts_r); free(nna_counts_i);
error(_("Unable to allocate %d * %zu bytes for non-NA counts in gmean na.rm=TRUE"), ngrp, sizeof(int));
// # nocov end
}
Expand Down Expand Up @@ -1117,7 +1120,8 @@ SEXP gprod(SEXP x, SEXP narmArg) {
//clock_t start = clock();
if (nrow != n) error(_("nrow [%d] != length(x) [%d] in %s"), nrow, n, "gprod");
long double *s = malloc(ngrp * sizeof(long double));
if (!s) error(_("Unable to allocate %d * %zu bytes for gprod"), ngrp, sizeof(long double));
if (!s)
error(_("Unable to allocate %d * %zu bytes for gprod"), ngrp, sizeof(long double));
for (int i=0; i<ngrp; ++i) s[i] = 1.0;
switch(TYPEOF(x)) {
case LGLSXP: case INTSXP: {
Expand Down
Loading
Loading