Improve BIDSLayout performance on very large datasets #285

gkiar · 2018-11-14T23:05:01Z

Hey - I was trying to use pybids in a pipeline of mine, but found that it takes a very long time to create a layout on large datasets. I tested this using the public NKI-RS dataset from FCP-INDI in BIDS format, and ran using subsets of the dataset of various size: 2 subjects, 90, and the full thing, 963.

My script is the following:

from bids.layout import BIDSLayout
import time

start1 = time.time()
bl = BIDSLayout('/project/6008063/gkiar/data/smallRS')
dur1 = time.time() - start1
print("{0}:{1}".format(len(bl.get_subjects()), dur1))


start2 = time.time()
bl = BIDSLayout('/project/6008063/gkiar/data/medRS/')
dur2 = time.time() - start2
print("{0}:{1}".format(len(bl.get_subjects()), dur2))

start3 = time.time()
bl = BIDSLayout('/project/6008063/gkiar/data/RocklandSample/')
dur3 = time.time() - start3
print("{0}:{1}".format(len(bl.get_subjects()), dur3))

The output, in the form of {n subs}:{time elapsed in seconds}, is:

$ date
Wed Nov 14 15:40:58 EST 2018
$ python profilebl.py
2:0.39104771614074707
90:14.413928270339966
963:631.153669834137

With this being the first step of my pipeline, and I'm almost always specifying a --participant_label or --session_label for a bunch of these launched in parallel, it would be great if it didn't take upwards of 10 minutes per task to find the data. I'd still like to have the BIDSLayout as part of the pipeline itself so I can grab various pieces of metadata as I need them.

Any ideas why this gets so slow as sample size increases, or places you suggest I could make a PR to speed things up (likely in grabbit)?

The text was updated successfully, but these errors were encountered:

adelavega · 2018-11-15T00:47:06Z

Are there hidden files in these datasets like .git? That could be slowing things down as well.

adelavega · 2018-11-15T00:48:54Z

Not sure which version you're using, but excluding derivatives could help. In the latest version (which is not stable), derivatives will be excluded by default. Try adding exclude="derivatives/" to the layout.

gkiar · 2018-11-15T15:22:06Z

There are no hidden files or a derivatives directory. I'm just installing from pypi (0.6.5) - should I install from github, instead?

gkiar · 2018-11-15T15:41:20Z

Also, would it be possible to create the bids-layout object with only specific subjects/sessions included, similar (but the inverse) of the exclude="derivatives/"? Maybe this would speed things up, if I am running a task that already knows what it's trying to process.

tyarkoni · 2018-11-15T17:10:16Z

Yes, try installing master—though be aware that there are some API-breaking changes (but you probably want to get ahead of those anyway, as the current PyPI release is out of sync with the BIDS Derivatives RC). That might fix your problem without any further effort, as derivatives are no longer indexed by default in master.

There's no way to limit to only certain subjects/sessions at the moment. I think this came up before and we decided it wasn't worth the effort, but I could be swayed if there's enough demand for it (or if I get a PR).

tyarkoni · 2018-11-15T17:15:44Z

Oh, sorry—missed where you said there are no derivatives. I'm having some reading difficulties today.

The current implementation isn't heavily optimized (and I don't think 0.7 fixes this), so it may be that it's just slow if you have a particularly large dataset. But the fact that load time seems to be supralinear in the number of subjects is kind of concerning.

Do you mind doing some profiling (cProfile is fine) and pasting the results here? I'm curious to see what's eating up those cycles...

gkiar · 2018-11-15T17:19:10Z

No problem - I'll kick off the script in a few minutes and let you know. Thanks!

gkiar · 2018-11-15T17:48:24Z

I ran this on the medium-sized dataset:

17:45.072930335998535
         4183738 function calls (4176443 primitive calls) in 45.171 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
    38045    0.029    0.000    0.072    0.000 <string>:12(__new__)
        1    0.000    0.000    0.001    0.001 __init__.py:274(load)
        1    0.000    0.000    0.000    0.000 __init__.py:302(loads)
        1    0.000    0.000    0.000    0.000 _bootlocale.py:23(getpreferredencoding)
        1    0.000    0.000    0.000    0.000 bids_layout.py:103(<listcomp>)
    39007    0.021    0.000    0.021    0.000 bids_layout.py:144(_validate_file)
        1    0.000    0.000   45.073   45.073 bids_layout.py:75(__init__)
        1    0.000    0.000    0.000    0.000 bids_validator.py:33(__init__)
        2    0.000    0.000    0.000    0.000 codecs.py:259(__init__)
        2    0.000    0.000    0.000    0.000 codecs.py:308(__init__)
        4    0.000    0.000    0.000    0.000 codecs.py:318(decode)
        1    0.000    0.000    0.000    0.000 core.py:146(__init__)
       19    0.000    0.000    0.000    0.000 core.py:172(add_entity)
    32363    0.021    0.000    0.028    0.000 core.py:180(add_file)
       19    0.000    0.000    0.006    0.000 core.py:194(__init__)
    32363    0.105    0.000    0.317    0.000 core.py:21(__init__)
       19    0.000    0.000    0.000    0.000 core.py:244(<listcomp>)
   614897    0.575    0.000    6.929    0.000 core.py:261(match_file)
    38045    0.042    0.000    0.042    0.000 core.py:278(add_file)
   614897    0.099    0.000    0.099    0.000 core.py:296(_astype)
        1    0.000    0.000   45.060   45.060 core.py:304(__init__)
    34171    0.021    0.000    0.038    0.000 core.py:31(entities)
    34171    0.014    0.000    0.014    0.000 core.py:33(<dictcomp>)
    32363    0.014    0.000    0.017    0.000 core.py:35(_matches)
        1    0.000    0.000    0.000    0.000 core.py:400(add_path)
        1    0.000    0.000    0.021    0.021 core.py:409(<listcomp>)
        1    0.000    0.000    0.021    0.021 core.py:423(_get_or_load_domain)
        1    0.000    0.000    0.000    0.000 core.py:458(get_domain_entities)
    39007    0.146    0.000    0.183    0.000 core.py:469(_check_inclusions)
     6644    0.006    0.000    0.008    0.000 core.py:496(_validate_dir)
     6645    0.008    0.000    1.896    0.000 core.py:510(_get_files)
    32363    0.072    0.000    0.577    0.000 core.py:514(_make_file_object)
        1    0.000    0.000    0.000    0.000 core.py:519(_reset_index)
    32363    0.749    0.000    8.543    0.000 core.py:525(_index_file)
        1    0.000    0.000   45.039   45.039 core.py:571(index)
   6645/1    0.363    0.000   45.039   45.039 core.py:575(_index_dir)
     6645    0.032    0.000    0.207    0.000 core.py:577(<listcomp>)
    39007    0.038    0.000    0.221    0.000 core.py:588(<lambda>)
    32363    0.028    0.000    0.028    0.000 core.py:605(<listcomp>)
       19    0.000    0.000    0.019    0.001 core.py:664(add_entity)
        1    0.030    0.030    0.098    0.098 core.py:697(get)
        1    0.009    0.009    0.044    0.044 core.py:765(<listcomp>)
        1    0.001    0.001    0.003    0.003 core.py:768(<listcomp>)
        1    0.000    0.000    0.000    0.000 decoder.py:334(decode)
        1    0.000    0.000    0.000    0.000 decoder.py:345(raw_decode)
      898    0.001    0.000    0.001    0.000 enum.py:265(__call__)
      898    0.001    0.000    0.001    0.000 enum.py:515(__new__)
       26    0.000    0.000    0.000    0.000 enum.py:795(__or__)
      423    0.001    0.000    0.002    0.000 enum.py:801(__and__)
     6648    0.020    0.000    0.215    0.000 genericpath.py:16(exists)
    39007    0.085    0.000   33.000    0.001 genericpath.py:39(isdir)
       19    0.000    0.000    0.000    0.000 inflect.py:1182(__init__)
       56    0.000    0.000    0.000    0.000 inflect.py:1295(ud_match)
       19    0.000    0.000    0.000    0.000 inflect.py:1524(postprocess)
       19    0.000    0.000    0.001    0.000 inflect.py:1539(partition_word)
       19    0.000    0.000    0.013    0.001 inflect.py:1581(plural)
       56    0.000    0.000    0.000    0.000 inflect.py:1827(get_count)
       18    0.001    0.000    0.004    0.000 inflect.py:1839(_plnoun)
       19    0.000    0.000    0.006    0.000 inflect.py:2123(_pl_special_verb)
       19    0.000    0.000    0.002    0.000 inflect.py:2224(_pl_special_adjective)
        1    0.000    0.000    0.012    0.012 linecache.py:15(getline)
        1    0.000    0.000    0.012    0.012 linecache.py:37(getlines)
        1    0.000    0.000    0.012    0.012 linecache.py:82(updatecache)
    32363    0.200    0.000    0.344    0.000 posixpath.py:102(split)
    32363    0.047    0.000    0.085    0.000 posixpath.py:142(basename)
    32364    0.077    0.000    0.126    0.000 posixpath.py:152(dirname)
        4    0.000    0.000    0.000    0.000 posixpath.py:329(normpath)
        4    0.000    0.000    0.000    0.000 posixpath.py:367(abspath)
   214120    0.093    0.000    0.177    0.000 posixpath.py:39(_get_sep)
        1    0.000    0.000    0.000    0.000 posixpath.py:485(commonpath)
        1    0.000    0.000    0.000    0.000 posixpath.py:500(<listcomp>)
        2    0.000    0.000    0.000    0.000 posixpath.py:503(<genexpr>)
        1    0.000    0.000    0.000    0.000 posixpath.py:507(<listcomp>)
        4    0.000    0.000    0.000    0.000 posixpath.py:62(isabs)
   117026    0.352    0.000    0.553    0.000 posixpath.py:73(join)
        1    0.000    0.000   45.171   45.171 profilebl.py:5(myfunc)
      151    0.000    0.000    0.011    0.000 re.py:179(search)
       17    0.000    0.000    0.000    0.000 re.py:204(split)
       19    0.000    0.000    0.006    0.000 re.py:231(compile)
      187    0.001    0.000    0.016    0.000 re.py:286(_compile)
       56    0.000    0.000    0.001    0.000 sre_compile.py:223(_compile_charset)
       56    0.001    0.000    0.001    0.000 sre_compile.py:250(_optimize_charset)
       35    0.000    0.000    0.000    0.000 sre_compile.py:376(_mk_bitmap)
       35    0.000    0.000    0.000    0.000 sre_compile.py:378(<listcomp>)
       40    0.000    0.000    0.000    0.000 sre_compile.py:388(_simple)
    28/23    0.000    0.000    0.000    0.000 sre_compile.py:414(_get_literal_prefix)
       23    0.000    0.000    0.000    0.000 sre_compile.py:441(_get_charset_prefix)
       27    0.000    0.000    0.001    0.000 sre_compile.py:482(_compile_info)
       54    0.000    0.000    0.000    0.000 sre_compile.py:539(isstring)
       27    0.000    0.000    0.007    0.000 sre_compile.py:542(_code)
       27    0.000    0.000    0.015    0.001 sre_compile.py:557(compile)
   232/27    0.002    0.000    0.006    0.000 sre_compile.py:64(_compile)
      232    0.000    0.000    0.000    0.000 sre_parse.py:111(__init__)
      167    0.000    0.000    0.000    0.000 sre_parse.py:159(__len__)
     1234    0.001    0.000    0.001    0.000 sre_parse.py:163(__getitem__)
       40    0.000    0.000    0.000    0.000 sre_parse.py:167(__setitem__)
      922    0.000    0.000    0.001    0.000 sre_parse.py:171(append)
   286/98    0.001    0.000    0.001    0.000 sre_parse.py:173(getwidth)
       27    0.000    0.000    0.000    0.000 sre_parse.py:223(__init__)
     1365    0.001    0.000    0.001    0.000 sre_parse.py:232(__next)
      433    0.000    0.000    0.000    0.000 sre_parse.py:248(match)
     1126    0.001    0.000    0.001    0.000 sre_parse.py:253(get)
      192    0.000    0.000    0.000    0.000 sre_parse.py:285(tell)
       15    0.000    0.000    0.000    0.000 sre_parse.py:294(_class_escape)
       15    0.000    0.000    0.000    0.000 sre_parse.py:342(_escape)
    72/27    0.000    0.000    0.007    0.000 sre_parse.py:407(_parse_sub)
   182/27    0.002    0.000    0.007    0.000 sre_parse.py:470(_parse)
       27    0.000    0.000    0.000    0.000 sre_parse.py:76(__init__)
      116    0.000    0.000    0.000    0.000 sre_parse.py:81(groups)
       27    0.000    0.000    0.000    0.000 sre_parse.py:828(fix_flags)
       31    0.000    0.000    0.000    0.000 sre_parse.py:84(opengroup)
       27    0.000    0.000    0.008    0.000 sre_parse.py:844(parse)
       31    0.000    0.000    0.001    0.000 sre_parse.py:96(closegroup)
        1    0.000    0.000    0.008    0.008 tokenize.py:355(detect_encoding)
        1    0.000    0.000    0.008    0.008 tokenize.py:379(read_or_stop)
        1    0.000    0.000    0.000    0.000 tokenize.py:385(find_cookie)
        1    0.000    0.000    0.009    0.009 tokenize.py:448(open)
       49    0.000    0.000    0.000    0.000 utils.py:11(<lambda>)
       17    0.000    0.000    0.000    0.000 utils.py:13(alphanum_key)
       17    0.000    0.000    0.000    0.000 utils.py:18(<listcomp>)
    32372    0.018    0.000    0.028    0.000 utils.py:34(listify)
        1    0.000    0.000    0.001    0.001 utils.py:7(natural_sort)
        1    0.000    0.000    0.012    0.012 warnings.py:106(_formatwarnmsg)
        1    0.000    0.000    0.012    0.012 warnings.py:20(_showwarnmsg_impl)
        1    0.000    0.000    0.012    0.012 warnings.py:35(_formatwarnmsg_impl)
        1    0.000    0.000    0.000    0.000 warnings.py:398(__init__)
        1    0.000    0.000    0.012    0.012 warnings.py:85(_showwarnmsg)
    38045    0.043    0.000    0.043    0.000 {built-in method __new__ of type object at 0x7f2be1561180}
        4    0.000    0.000    0.000    0.000 {built-in method _codecs.utf_8_decode}
        1    0.000    0.000    0.000    0.000 {built-in method _locale.nl_langinfo}
       27    0.000    0.000    0.000    0.000 {built-in method _sre.compile}
       90    0.000    0.000    0.000    0.000 {built-in method _sre.getlower}
    39007    0.016    0.000    0.016    0.000 {built-in method _stat.S_ISDIR}
        1    0.000    0.000    0.012    0.012 {built-in method _warnings.warn}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.eval}
        1    0.000    0.000   45.171   45.171 {built-in method builtins.exec}
       38    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
   287364    0.108    0.000    0.108    0.000 {built-in method builtins.isinstance}
66492/66439    0.015    0.000    0.015    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.locals}
      112    0.000    0.000    0.000    0.000 {built-in method builtins.max}
      575    0.000    0.000    0.000    0.000 {built-in method builtins.min}
      898    0.000    0.000    0.000    0.000 {built-in method builtins.ord}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
       26    0.000    0.000    0.000    0.000 {built-in method builtins.setattr}
        1    0.000    0.000    0.001    0.001 {built-in method builtins.sorted}
        2    0.001    0.000    0.001    0.000 {built-in method io.open}
   214128    0.066    0.000    0.066    0.000 {built-in method posix.fspath}
        2    0.000    0.000    0.000    0.000 {built-in method posix.getcwd}
     6645    1.889    0.000    1.889    0.000 {built-in method posix.listdir}
    45656   33.097    0.001   33.097    0.001 {built-in method posix.stat}
        2    0.000    0.000    0.000    0.000 {built-in method time.time}
    69082    0.011    0.000    0.011    0.000 {method 'append' of 'list' objects}
       19    0.000    0.000    0.000    0.000 {method 'copy' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'decode' of 'bytes' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 {method 'end' of '_sre.SRE_Match' objects}
    78078    0.024    0.000    0.024    0.000 {method 'endswith' of 'str' objects}
       40    0.000    0.000    0.000    0.000 {method 'extend' of 'list' objects}
      236    0.000    0.000    0.000    0.000 {method 'find' of 'bytearray' objects}
        1    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
     6722    0.006    0.000    0.006    0.000 {method 'get' of 'dict' objects}
    38103    0.024    0.000    0.024    0.000 {method 'group' of '_sre.SRE_Match' objects}
    39007    0.023    0.000    0.023    0.000 {method 'insert' of 'list' objects}
       49    0.000    0.000    0.000    0.000 {method 'isdigit' of 'str' objects}
    98009    0.014    0.000    0.014    0.000 {method 'items' of 'dict' objects}
       42    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'keys' of 'collections.OrderedDict' objects}
       73    0.000    0.000    0.000    0.000 {method 'keys' of 'dict' objects}
      142    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
        4    0.000    0.000    0.000    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
        1    0.001    0.001    0.001    0.001 {method 'read' of '_io.TextIOWrapper' objects}
        1    0.008    0.008    0.008    0.008 {method 'readline' of '_io.BufferedReader' objects}
        1    0.000    0.000    0.000    0.000 {method 'readlines' of '_io._IOBase' objects}
        2    0.000    0.000    0.000    0.000 {method 'replace' of 'str' objects}
    97090    0.063    0.000    0.063    0.000 {method 'rfind' of 'str' objects}
    64727    0.033    0.000    0.033    0.000 {method 'rstrip' of 'str' objects}
   615048    6.233    0.000    6.233    0.000 {method 'search' of '_sre.SRE_Pattern' objects}
        1    0.000    0.000    0.000    0.000 {method 'seek' of '_io.BufferedReader' objects}
       17    0.000    0.000    0.000    0.000 {method 'split' of '_sre.SRE_Pattern' objects}
       78    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'startswith' of 'bytes' objects}
   117042    0.055    0.000    0.055    0.000 {method 'startswith' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'strip' of 'str' objects}
       35    0.000    0.000    0.000    0.000 {method 'translate' of 'bytearray' objects}
        2    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
       39    0.000    0.000    0.000    0.000 {method 'upper' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'values' of 'collections.OrderedDict' objects}
    32363    0.109    0.000    0.109    0.000 {method 'values' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'write' of '_io.TextIOWrapper' objects}

adelavega · 2018-11-15T18:28:26Z

As a short term fix, it looks like you could use the exclude argument with regex to exclude things you don't want:
https://github.com/grabbles/grabbit/blob/585af5dda23d6908e457b77a4e2190d380e466c5/grabbit/core.py#L490

I'm not sure if this will work at the directory level (which would speed things up even more), but it's worth a try.

Mind giving that a try on the large dataset excluding all but one subject?

FYI this is sort of a "hidden" feature right now, since exclude is a kwarg for grabbit, and not in the official pybids API. include cannot be passed down to grabbit, since pybids handles that in a more domain-specific way.

gkiar · 2018-11-15T18:42:31Z

OK, so you're saying I should trying something like:

bl = BIDSLayout('/project/6008063/gkiar/data/smallRS', exclude='^sub-(?!A00028185)(.*)$')

Where A00028185 is my subject id. Is that correct?

adelavega · 2018-11-15T19:44:55Z

Sure! I trust your regex skillz

gkiar · 2018-11-16T20:09:49Z

Heyo - so I verified that the regex picked up only what I want (https://regex101.com/r/uOKFHL/1), but the load time didn't change for any scale. Any other ideas?

adelavega · 2018-11-16T21:23:15Z

But when you do layout.get_subjects are only the subjects you care about listed? When I tried it, your regex didn't work, but after removing the ^ it did (not sure why)

Maybe try: sub-(?!A00018030|A00066396)(.*)$?
And if that doesn't work then it's still looping over all files, but not indexing them, which is a problem.

gkiar · 2018-11-16T22:11:05Z

Ah, thanks for fixing the regex! It's because it is looking in provided directory, so sub- isn't at the start of the line. Your method worked to reduce the time - the 900 subject list took 1.4 seconds now.

I'll use this for the time being and as I play around I'll let you know if I notice any peculiarities. Thanks! 👍

adelavega · 2018-11-16T22:14:56Z

Awesome. I'm going to reopen this issue (with a more general name) because I think this is a all too common scenario, and coming up w/ said regex is obvious not that intuitive.

I think officially supporting excluding subjects at the BIDSLayout level in the API could be useful for many.

And like Tal said, the supralinear time increase w/ dataset size is worrisome...

tyarkoni · 2018-11-17T01:12:04Z

Looks like most of the time is being eaten up by os.stat (via os.path.isdir), which is pretty surprising. Not sure how much we'll be able to do about this, but I'll look into it.

yarikoptic · 2018-11-17T02:36:08Z

FWIW -- if that is the same directories over and over again, might be worth creating/using a little simple "memoizer" for isdir for the duration of the call so previous result is reused/returned upon subsequent calls querying the same directory.

effigies · 2018-11-17T14:24:31Z

It might also be faster to assume that things that should be directories are, catch exceptions when trying to open/stat something underneath, and perform the isdir check in the except block if you really need to diagnose it.

tyarkoni · 2018-11-17T20:34:45Z

@yarikoptic good idea. I'll have to take a closer look to see where the isdir calls are coming from... the metadata indexing in pybids is already memoized, but the file indexing in grabbit isn't (I think).

I'm pretty sure we do need to determine whether each path is a file or directory, because different validation hooks get triggered (_validate_dir vs. _validate_file), and I'm not sure they're guaranteed to fail if the wrong type is passed.

gkiar · 2019-02-14T20:09:52Z

Update: using branch for 0.8: problem seems to be fixed!

$ python profilebl.py
963:41.128883838653564
         5455536 function calls (5396092 primitive calls) in 41.147 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   41.147   41.147 <string>:1(<module>)
    56828    0.165    0.000    2.157    0.000 __init__.py:274(load)
    56828    0.095    0.000    1.475    0.000 __init__.py:302(loads)
    56827    0.064    0.000    0.203    0.000 _bootlocale.py:23(getpreferredencoding)
     8118    0.041    0.000    3.065    0.000 bids_validator.py:101(is_session_level)
     8118    0.024    0.000    0.216    0.000 bids_validator.py:106(<listcomp>)
     8118    0.030    0.000    2.667    0.000 bids_validator.py:110(is_subject_level)
     8118    0.010    0.000    0.038    0.000 bids_validator.py:115(<listcomp>)
     8118    0.027    0.000    2.617    0.000 bids_validator.py:121(is_phenotypic)
     8118    0.010    0.000    0.034    0.000 bids_validator.py:126(<listcomp>)
     8118    0.060    0.000    3.695    0.000 bids_validator.py:130(is_file)
     8118    0.056    0.000    0.293    0.000 bids_validator.py:135(<listcomp>)
    48708    0.857    0.000   16.785    0.000 bids_validator.py:139(get_regular_expressions)
    56826    0.048    0.000    0.192    0.000 bids_validator.py:179(conditional_match)
        1    0.000    0.000    0.000    0.000 bids_validator.py:35(__init__)
     8118    0.075    0.000   20.621    0.003 bids_validator.py:39(is_bids)
     8118    0.146    0.000    5.827    0.001 bids_validator.py:72(is_top_level)
     8118    0.039    0.000    0.192    0.000 bids_validator.py:82(<listcomp>)
     8118    0.033    0.000    2.668    0.000 bids_validator.py:89(is_associated_data)
     8118    0.010    0.000    0.038    0.000 bids_validator.py:97(<listcomp>)
    56828    0.026    0.000    0.026    0.000 codecs.py:259(__init__)
    56828    0.056    0.000    0.082    0.000 codecs.py:308(__init__)
    56828    0.092    0.000    0.155    0.000 codecs.py:318(decode)
        1    0.000    0.000    0.000    0.000 config.py:33(get_option)
   101680    0.074    0.000    0.370    0.000 core.py:123(match_file)
    23792    0.011    0.000    0.011    0.000 core.py:140(add_file)
   101680    0.013    0.000    0.013    0.000 core.py:158(_astype)
     6355    0.019    0.000    0.075    0.000 core.py:172(__init__)
     6355    0.002    0.000    0.002    0.000 core.py:180(_matches)
   6528/1    0.059    0.000   41.077   41.077 core.py:338(__init__)
        1    0.000    0.000    0.004    0.004 core.py:38(__init__)
     6528    0.014    0.000    0.026    0.000 core.py:381(_update_entities)
     6528    0.012    0.000    0.028    0.000 core.py:387(_extract_entities)
     6527    0.022    0.000    0.048    0.000 core.py:394(_get_child_class)
     4253    0.001    0.000    0.001    0.000 core.py:422(_setup)
     6528    0.007    0.000    0.025    0.000 core.py:425(abs_path)
     8803    0.003    0.000    0.003    0.000 core.py:429(root_path)
89091/45522    0.041    0.000    0.041    0.000 core.py:433(layout)
   6528/1    0.195    0.000   41.077   41.077 core.py:437(index)
        1    0.000    0.000    0.005    0.005 core.py:48(load)
     1311    0.001    0.000    0.001    0.000 core.py:523(_setup)
      963    0.002    0.000    0.002    0.000 core.py:534(_setup)
      963    0.000    0.000    0.000    0.000 core.py:535(<listcomp>)
        1    0.000    0.000   41.077   41.077 core.py:546(__init__)
        1    0.000    0.000    0.000    0.000 core.py:551(_setup)
        1    0.000    0.000    0.000    0.000 core.py:552(<dictcomp>)
       16    0.000    0.000    0.004    0.000 core.py:64(__init__)
    56828    0.120    0.000    1.343    0.000 decoder.py:334(decode)
    56828    1.074    0.000    1.074    0.000 decoder.py:345(raw_decode)
      772    0.000    0.000    0.001    0.000 enum.py:265(__call__)
      772    0.000    0.000    0.000    0.000 enum.py:515(__new__)
       15    0.000    0.000    0.000    0.000 enum.py:795(__or__)
      371    0.001    0.000    0.001    0.000 enum.py:801(__and__)
     6531    0.017    0.000   15.103    0.002 genericpath.py:16(exists)
     8118    0.023    0.000    0.039    0.000 genericpath.py:69(commonprefix)
        4    0.000    0.000    0.000    0.000 inflect.py:1659(get_si_pron)
        1    0.000    0.000    0.000    0.000 inflect.py:1916(__init__)
        2    0.000    0.000    0.000    0.000 inflect.py:2024(ud_match)
        1    0.000    0.000    0.000    0.000 inflect.py:2201(postprocess)
        1    0.000    0.000    0.000    0.000 inflect.py:2219(partition_word)
        1    0.000    0.000    0.003    0.003 inflect.py:2365(singular_noun)
        2    0.000    0.000    0.000    0.000 inflect.py:2461(get_count)
        2    0.000    0.000    0.002    0.001 inflect.py:2921(_sinoun)
        1    0.004    0.004   41.129   41.129 layout.py:150(__init__)
        1    0.000    0.000    0.000    0.000 layout.py:167(<listcomp>)
        1    0.000    0.000    0.000    0.000 layout.py:170(<listcomp>)
        1    0.000    0.000    0.005    0.005 layout.py:183(<listcomp>)
        1    0.000    0.000    0.000    0.000 layout.py:184(<dictcomp>)
        1    0.000    0.000    0.040    0.040 layout.py:202(_validate_root)
        1    0.000    0.000    0.000    0.000 layout.py:237(_setup_file_validator)
     6527    0.005    0.000    0.063    0.000 layout.py:248(_validate_dir)
     8118    0.037    0.000   21.137    0.003 layout.py:251(_validate_file)
        1    0.000    0.000    0.000    0.000 layout.py:275(_get_layouts_in_scope)
        1    0.000    0.000    0.003    0.003 layout.py:287(__getattr__)
        1    0.003    0.003    0.015    0.015 layout.py:434(get)
        1    0.001    0.001    0.001    0.001 layout.py:539(<listcomp>)
        1    0.001    0.001    0.001    0.001 layout.py:542(<listcomp>)
        1    0.000    0.000    0.000    0.000 layout.py:987(__init__)
    13056    0.061    0.000    3.709    0.000 os.py:277(walk)
     6355    0.014    0.000    0.029    0.000 posixpath.py:142(basename)
     6356    0.017    0.000    0.028    0.000 posixpath.py:152(dirname)
    45527    0.186    0.000    0.297    0.000 posixpath.py:329(normpath)
    45527    0.046    0.000    0.420    0.000 posixpath.py:367(abspath)
   108706    0.034    0.000    0.062    0.000 posixpath.py:39(_get_sep)
     8118    0.050    0.000    0.278    0.000 posixpath.py:444(relpath)
     8118    0.006    0.000    0.006    0.000 posixpath.py:466(<listcomp>)
     8118    0.005    0.000    0.005    0.000 posixpath.py:467(<listcomp>)
    45527    0.030    0.000    0.071    0.000 posixpath.py:62(isabs)
    50468    0.121    0.000    0.189    0.000 posixpath.py:73(join)
        1    0.000    0.000   41.147   41.147 profilebl.py:5(myfunc)
     2275    0.001    0.000    0.007    0.000 re.py:169(match)
        3    0.000    0.000    0.002    0.001 re.py:179(search)
      963    0.000    0.000    0.001    0.000 re.py:204(split)
     5860    0.004    0.000    0.024    0.000 re.py:214(findall)
   251674    0.077    0.000    0.312    0.000 re.py:231(compile)
   260775    0.211    0.000    0.246    0.000 re.py:286(_compile)
      210    0.000    0.000    0.004    0.000 sre_compile.py:223(_compile_charset)
      210    0.001    0.000    0.003    0.000 sre_compile.py:250(_optimize_charset)
      164    0.000    0.000    0.001    0.000 sre_compile.py:376(_mk_bitmap)
      164    0.001    0.000    0.001    0.000 sre_compile.py:378(<listcomp>)
      334    0.001    0.000    0.001    0.000 sre_compile.py:388(_simple)
        3    0.000    0.000    0.000    0.000 sre_compile.py:393(_generate_overlap_table)
    56/52    0.000    0.000    0.000    0.000 sre_compile.py:414(_get_literal_prefix)
       49    0.000    0.000    0.000    0.000 sre_compile.py:441(_get_charset_prefix)
       53    0.000    0.000    0.003    0.000 sre_compile.py:482(_compile_info)
      106    0.000    0.000    0.000    0.000 sre_compile.py:539(isstring)
       53    0.000    0.000    0.015    0.000 sre_compile.py:542(_code)
       53    0.000    0.000    0.035    0.001 sre_compile.py:557(compile)
   909/53    0.005    0.000    0.012    0.000 sre_compile.py:64(_compile)
       39    0.000    0.000    0.000    0.000 sre_parse.py:101(checklookbehindgroup)
      909    0.000    0.000    0.000    0.000 sre_parse.py:111(__init__)
     1332    0.000    0.000    0.000    0.000 sre_parse.py:159(__len__)
      141    0.000    0.000    0.000    0.000 sre_parse.py:161(__delitem__)
     5633    0.002    0.000    0.003    0.000 sre_parse.py:163(__getitem__)
      334    0.000    0.000    0.000    0.000 sre_parse.py:167(__setitem__)
     3819    0.001    0.000    0.001    0.000 sre_parse.py:171(append)
 1329/495    0.003    0.000    0.003    0.000 sre_parse.py:173(getwidth)
       53    0.000    0.000    0.000    0.000 sre_parse.py:223(__init__)
     6513    0.003    0.000    0.003    0.000 sre_parse.py:232(__next)
     2090    0.001    0.000    0.001    0.000 sre_parse.py:248(match)
     5370    0.002    0.000    0.004    0.000 sre_parse.py:253(get)
     1114    0.000    0.000    0.000    0.000 sre_parse.py:285(tell)
       20    0.000    0.000    0.000    0.000 sre_parse.py:294(_class_escape)
      245    0.000    0.000    0.001    0.000 sre_parse.py:342(_escape)
   321/53    0.001    0.000    0.019    0.000 sre_parse.py:407(_parse_sub)
   527/53    0.007    0.000    0.019    0.000 sre_parse.py:470(_parse)
       53    0.000    0.000    0.000    0.000 sre_parse.py:76(__init__)
      400    0.000    0.000    0.000    0.000 sre_parse.py:81(groups)
       53    0.000    0.000    0.000    0.000 sre_parse.py:828(fix_flags)
      108    0.000    0.000    0.000    0.000 sre_parse.py:84(opengroup)
       53    0.000    0.000    0.019    0.000 sre_parse.py:844(parse)
      108    0.000    0.000    0.001    0.000 sre_parse.py:96(closegroup)
       39    0.000    0.000    0.000    0.000 sre_parse.py:98(checkgroup)
        1    0.000    0.000    0.006    0.006 utils.py:29(natural_sort)
     2889    0.001    0.000    0.002    0.000 utils.py:33(<lambda>)
      963    0.001    0.000    0.005    0.000 utils.py:35(alphanum_key)
      963    0.001    0.000    0.003    0.000 utils.py:40(<listcomp>)
    11081    0.009    0.000    0.012    0.000 utils.py:6(listify)
    29290    0.020    0.000    0.317    0.000 utils.py:91(check_path_matches_patterns)
    56828    0.063    0.000    0.063    0.000 {built-in method _codecs.utf_8_decode}
    56827    0.139    0.000    0.139    0.000 {built-in method _locale.nl_langinfo}
       53    0.000    0.000    0.000    0.000 {built-in method _sre.compile}
       68    0.000    0.000    0.000    0.000 {built-in method _sre.getlower}
    56826    0.016    0.000    0.016    0.000 {built-in method builtins.any}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.eval}
        1    0.000    0.000   41.147   41.147 {built-in method builtins.exec}
   245643    0.061    0.000    0.061    0.000 {built-in method builtins.isinstance}
143606/143221    0.019    0.000    0.020    0.000 {built-in method builtins.len}
     8361    0.004    0.000    0.004    0.000 {built-in method builtins.max}
    10196    0.011    0.000    0.011    0.000 {built-in method builtins.min}
    21174    2.669    0.000    2.669    0.000 {built-in method builtins.next}
     4050    0.000    0.000    0.000    0.000 {built-in method builtins.ord}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.001    0.001    0.006    0.006 {built-in method builtins.sorted}
    56828   15.770    0.000   16.054    0.000 {built-in method io.open}
   222524    0.036    0.000    0.036    0.000 {built-in method posix.fspath}
     6528    0.969    0.000    0.969    0.000 {built-in method posix.scandir}
     6531   15.085    0.002   15.085    0.002 {built-in method posix.stat}
        2    0.000    0.000    0.000    0.000 {built-in method time.time}
   719070    0.072    0.000    0.072    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'capitalize' of 'str' objects}
        2    0.000    0.000    0.000    0.000 {method 'copy' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   113656    0.018    0.000    0.018    0.000 {method 'end' of '_sre.SRE_Match' objects}
    58818    0.012    0.000    0.012    0.000 {method 'endswith' of 'str' objects}
      175    0.000    0.000    0.000    0.000 {method 'extend' of 'list' objects}
      962    0.000    0.000    0.000    0.000 {method 'find' of 'bytearray' objects}
    62686    0.085    0.000    0.085    0.000 {method 'findall' of '_sre.SRE_Pattern' objects}
        2    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
      573    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
    23795    0.007    0.000    0.007    0.000 {method 'group' of '_sre.SRE_Match' objects}
    14646    0.005    0.000    0.005    0.000 {method 'is_dir' of 'posix.DirEntry' objects}
     2889    0.000    0.000    0.000    0.000 {method 'isdigit' of 'str' objects}
    12774    0.002    0.000    0.002    0.000 {method 'items' of 'dict' objects}
   337778    0.100    0.000    0.100    0.000 {method 'join' of 'str' objects}
    48714    0.013    0.000    0.013    0.000 {method 'keys' of 'dict' objects}
     1936    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
   115931    0.123    0.000    0.123    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
    56828    0.362    0.000    0.517    0.000 {method 'read' of '_io.TextIOWrapper' objects}
   298110    0.210    0.000    0.210    0.000 {method 'replace' of 'str' objects}
    12711    0.008    0.000    0.008    0.000 {method 'rfind' of 'str' objects}
     6356    0.004    0.000    0.004    0.000 {method 'rstrip' of 'str' objects}
   296515    0.506    0.000    0.506    0.000 {method 'search' of '_sre.SRE_Pattern' objects}
      963    0.000    0.000    0.000    0.000 {method 'split' of '_sre.SRE_Pattern' objects}
    61771    0.043    0.000    0.043    0.000 {method 'split' of 'str' objects}
   265284    0.065    0.000    0.065    0.000 {method 'startswith' of 'str' objects}
      164    0.000    0.000    0.000    0.000 {method 'translate' of 'bytearray' objects}
    13058    0.015    0.000    0.015    0.000 {method 'update' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'upper' of 'str' objects}
     6357    0.001    0.000    0.001    0.000 {method 'values' of 'dict' objects}

effigies · 2019-02-14T20:15:19Z

By my read that shaved off ~4s/45s. Or is that a different dataset than your last profile?

gkiar · 2019-02-14T20:19:23Z

Different dataset - sorry. Look at the initial comment. It turned 631 seconds into 41

effigies · 2019-02-14T20:20:09Z

Nice.

tyarkoni · 2019-02-14T20:21:31Z

Sweet, thanks Greg!

gkiar closed this as completed Nov 16, 2018

adelavega reopened this Nov 16, 2018

adelavega changed the title ~~BIDSLayout gets slow with big datasets~~ Improve BIDSLayout performance on very large datasets Nov 16, 2018

adelavega added the enhancement label Nov 16, 2018

effigies mentioned this issue Nov 21, 2018

Creating workflows is surprisingly slow nipreps/fmriprep#1415

Closed

tyarkoni mentioned this issue Feb 14, 2019

REFACTOR: 0.8 [WIP] #369

Merged

5 tasks

tyarkoni closed this as completed in #369 Feb 15, 2019

johnsaigle mentioned this issue May 12, 2020

BIDSLayout performance on very large datasets, continued #609

Closed

gkiar mentioned this issue Apr 1, 2022

Evaluate ancpbids as a successor to bids.layout #831

Open

gkiar mentioned this issue May 4, 2022

[WIP] Replace BIDSLayout backend with ancpbids #851

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve BIDSLayout performance on very large datasets #285

Improve BIDSLayout performance on very large datasets #285

gkiar commented Nov 14, 2018

adelavega commented Nov 15, 2018

adelavega commented Nov 15, 2018 •

edited

Loading

gkiar commented Nov 15, 2018

gkiar commented Nov 15, 2018

tyarkoni commented Nov 15, 2018

tyarkoni commented Nov 15, 2018 •

edited

Loading

gkiar commented Nov 15, 2018

gkiar commented Nov 15, 2018

adelavega commented Nov 15, 2018

gkiar commented Nov 15, 2018

adelavega commented Nov 15, 2018

gkiar commented Nov 16, 2018

adelavega commented Nov 16, 2018

gkiar commented Nov 16, 2018

adelavega commented Nov 16, 2018

tyarkoni commented Nov 17, 2018

yarikoptic commented Nov 17, 2018

effigies commented Nov 17, 2018

tyarkoni commented Nov 17, 2018

gkiar commented Feb 14, 2019

effigies commented Feb 14, 2019

gkiar commented Feb 14, 2019

effigies commented Feb 14, 2019

tyarkoni commented Feb 14, 2019

Improve BIDSLayout performance on very large datasets #285

Improve BIDSLayout performance on very large datasets #285

Comments

gkiar commented Nov 14, 2018

adelavega commented Nov 15, 2018

adelavega commented Nov 15, 2018 • edited Loading

gkiar commented Nov 15, 2018

gkiar commented Nov 15, 2018

tyarkoni commented Nov 15, 2018

tyarkoni commented Nov 15, 2018 • edited Loading

gkiar commented Nov 15, 2018

gkiar commented Nov 15, 2018

adelavega commented Nov 15, 2018

gkiar commented Nov 15, 2018

adelavega commented Nov 15, 2018

gkiar commented Nov 16, 2018

adelavega commented Nov 16, 2018

gkiar commented Nov 16, 2018

adelavega commented Nov 16, 2018

tyarkoni commented Nov 17, 2018

yarikoptic commented Nov 17, 2018

effigies commented Nov 17, 2018

tyarkoni commented Nov 17, 2018

gkiar commented Feb 14, 2019

effigies commented Feb 14, 2019

gkiar commented Feb 14, 2019

effigies commented Feb 14, 2019

tyarkoni commented Feb 14, 2019

adelavega commented Nov 15, 2018 •

edited

Loading

tyarkoni commented Nov 15, 2018 •

edited

Loading