Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve BIDSLayout performance on very large datasets #285

Closed
gkiar opened this issue Nov 14, 2018 · 24 comments · Fixed by #369
Closed

Improve BIDSLayout performance on very large datasets #285

gkiar opened this issue Nov 14, 2018 · 24 comments · Fixed by #369

Comments

@gkiar
Copy link
Contributor

gkiar commented Nov 14, 2018

Hey - I was trying to use pybids in a pipeline of mine, but found that it takes a very long time to create a layout on large datasets. I tested this using the public NKI-RS dataset from FCP-INDI in BIDS format, and ran using subsets of the dataset of various size: 2 subjects, 90, and the full thing, 963.

My script is the following:

from bids.layout import BIDSLayout
import time

start1 = time.time()
bl = BIDSLayout('/project/6008063/gkiar/data/smallRS')
dur1 = time.time() - start1
print("{0}:{1}".format(len(bl.get_subjects()), dur1))


start2 = time.time()
bl = BIDSLayout('/project/6008063/gkiar/data/medRS/')
dur2 = time.time() - start2
print("{0}:{1}".format(len(bl.get_subjects()), dur2))

start3 = time.time()
bl = BIDSLayout('/project/6008063/gkiar/data/RocklandSample/')
dur3 = time.time() - start3
print("{0}:{1}".format(len(bl.get_subjects()), dur3))

The output, in the form of {n subs}:{time elapsed in seconds}, is:

$ date
Wed Nov 14 15:40:58 EST 2018
$ python profilebl.py
2:0.39104771614074707
90:14.413928270339966
963:631.153669834137

With this being the first step of my pipeline, and I'm almost always specifying a --participant_label or --session_label for a bunch of these launched in parallel, it would be great if it didn't take upwards of 10 minutes per task to find the data. I'd still like to have the BIDSLayout as part of the pipeline itself so I can grab various pieces of metadata as I need them.

Any ideas why this gets so slow as sample size increases, or places you suggest I could make a PR to speed things up (likely in grabbit)?

@adelavega
Copy link
Collaborator

Are there hidden files in these datasets like .git? That could be slowing things down as well.

@adelavega
Copy link
Collaborator

adelavega commented Nov 15, 2018

Not sure which version you're using, but excluding derivatives could help. In the latest version (which is not stable), derivatives will be excluded by default. Try adding exclude="derivatives/" to the layout.

@gkiar
Copy link
Contributor Author

gkiar commented Nov 15, 2018

There are no hidden files or a derivatives directory. I'm just installing from pypi (0.6.5) - should I install from github, instead?

@gkiar
Copy link
Contributor Author

gkiar commented Nov 15, 2018

Also, would it be possible to create the bids-layout object with only specific subjects/sessions included, similar (but the inverse) of the exclude="derivatives/"? Maybe this would speed things up, if I am running a task that already knows what it's trying to process.

@tyarkoni
Copy link
Collaborator

Yes, try installing master—though be aware that there are some API-breaking changes (but you probably want to get ahead of those anyway, as the current PyPI release is out of sync with the BIDS Derivatives RC). That might fix your problem without any further effort, as derivatives are no longer indexed by default in master.

There's no way to limit to only certain subjects/sessions at the moment. I think this came up before and we decided it wasn't worth the effort, but I could be swayed if there's enough demand for it (or if I get a PR).

@tyarkoni
Copy link
Collaborator

tyarkoni commented Nov 15, 2018

Oh, sorry—missed where you said there are no derivatives. I'm having some reading difficulties today.

The current implementation isn't heavily optimized (and I don't think 0.7 fixes this), so it may be that it's just slow if you have a particularly large dataset. But the fact that load time seems to be supralinear in the number of subjects is kind of concerning.

Do you mind doing some profiling (cProfile is fine) and pasting the results here? I'm curious to see what's eating up those cycles...

@gkiar
Copy link
Contributor Author

gkiar commented Nov 15, 2018

No problem - I'll kick off the script in a few minutes and let you know. Thanks!

@gkiar
Copy link
Contributor Author

gkiar commented Nov 15, 2018

I ran this on the medium-sized dataset:

17:45.072930335998535
         4183738 function calls (4176443 primitive calls) in 45.171 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
    38045    0.029    0.000    0.072    0.000 <string>:12(__new__)
        1    0.000    0.000    0.001    0.001 __init__.py:274(load)
        1    0.000    0.000    0.000    0.000 __init__.py:302(loads)
        1    0.000    0.000    0.000    0.000 _bootlocale.py:23(getpreferredencoding)
        1    0.000    0.000    0.000    0.000 bids_layout.py:103(<listcomp>)
    39007    0.021    0.000    0.021    0.000 bids_layout.py:144(_validate_file)
        1    0.000    0.000   45.073   45.073 bids_layout.py:75(__init__)
        1    0.000    0.000    0.000    0.000 bids_validator.py:33(__init__)
        2    0.000    0.000    0.000    0.000 codecs.py:259(__init__)
        2    0.000    0.000    0.000    0.000 codecs.py:308(__init__)
        4    0.000    0.000    0.000    0.000 codecs.py:318(decode)
        1    0.000    0.000    0.000    0.000 core.py:146(__init__)
       19    0.000    0.000    0.000    0.000 core.py:172(add_entity)
    32363    0.021    0.000    0.028    0.000 core.py:180(add_file)
       19    0.000    0.000    0.006    0.000 core.py:194(__init__)
    32363    0.105    0.000    0.317    0.000 core.py:21(__init__)
       19    0.000    0.000    0.000    0.000 core.py:244(<listcomp>)
   614897    0.575    0.000    6.929    0.000 core.py:261(match_file)
    38045    0.042    0.000    0.042    0.000 core.py:278(add_file)
   614897    0.099    0.000    0.099    0.000 core.py:296(_astype)
        1    0.000    0.000   45.060   45.060 core.py:304(__init__)
    34171    0.021    0.000    0.038    0.000 core.py:31(entities)
    34171    0.014    0.000    0.014    0.000 core.py:33(<dictcomp>)
    32363    0.014    0.000    0.017    0.000 core.py:35(_matches)
        1    0.000    0.000    0.000    0.000 core.py:400(add_path)
        1    0.000    0.000    0.021    0.021 core.py:409(<listcomp>)
        1    0.000    0.000    0.021    0.021 core.py:423(_get_or_load_domain)
        1    0.000    0.000    0.000    0.000 core.py:458(get_domain_entities)
    39007    0.146    0.000    0.183    0.000 core.py:469(_check_inclusions)
     6644    0.006    0.000    0.008    0.000 core.py:496(_validate_dir)
     6645    0.008    0.000    1.896    0.000 core.py:510(_get_files)
    32363    0.072    0.000    0.577    0.000 core.py:514(_make_file_object)
        1    0.000    0.000    0.000    0.000 core.py:519(_reset_index)
    32363    0.749    0.000    8.543    0.000 core.py:525(_index_file)
        1    0.000    0.000   45.039   45.039 core.py:571(index)
   6645/1    0.363    0.000   45.039   45.039 core.py:575(_index_dir)
     6645    0.032    0.000    0.207    0.000 core.py:577(<listcomp>)
    39007    0.038    0.000    0.221    0.000 core.py:588(<lambda>)
    32363    0.028    0.000    0.028    0.000 core.py:605(<listcomp>)
       19    0.000    0.000    0.019    0.001 core.py:664(add_entity)
        1    0.030    0.030    0.098    0.098 core.py:697(get)
        1    0.009    0.009    0.044    0.044 core.py:765(<listcomp>)
        1    0.001    0.001    0.003    0.003 core.py:768(<listcomp>)
        1    0.000    0.000    0.000    0.000 decoder.py:334(decode)
        1    0.000    0.000    0.000    0.000 decoder.py:345(raw_decode)
      898    0.001    0.000    0.001    0.000 enum.py:265(__call__)
      898    0.001    0.000    0.001    0.000 enum.py:515(__new__)
       26    0.000    0.000    0.000    0.000 enum.py:795(__or__)
      423    0.001    0.000    0.002    0.000 enum.py:801(__and__)
     6648    0.020    0.000    0.215    0.000 genericpath.py:16(exists)
    39007    0.085    0.000   33.000    0.001 genericpath.py:39(isdir)
       19    0.000    0.000    0.000    0.000 inflect.py:1182(__init__)
       56    0.000    0.000    0.000    0.000 inflect.py:1295(ud_match)
       19    0.000    0.000    0.000    0.000 inflect.py:1524(postprocess)
       19    0.000    0.000    0.001    0.000 inflect.py:1539(partition_word)
       19    0.000    0.000    0.013    0.001 inflect.py:1581(plural)
       56    0.000    0.000    0.000    0.000 inflect.py:1827(get_count)
       18    0.001    0.000    0.004    0.000 inflect.py:1839(_plnoun)
       19    0.000    0.000    0.006    0.000 inflect.py:2123(_pl_special_verb)
       19    0.000    0.000    0.002    0.000 inflect.py:2224(_pl_special_adjective)
        1    0.000    0.000    0.012    0.012 linecache.py:15(getline)
        1    0.000    0.000    0.012    0.012 linecache.py:37(getlines)
        1    0.000    0.000    0.012    0.012 linecache.py:82(updatecache)
    32363    0.200    0.000    0.344    0.000 posixpath.py:102(split)
    32363    0.047    0.000    0.085    0.000 posixpath.py:142(basename)
    32364    0.077    0.000    0.126    0.000 posixpath.py:152(dirname)
        4    0.000    0.000    0.000    0.000 posixpath.py:329(normpath)
        4    0.000    0.000    0.000    0.000 posixpath.py:367(abspath)
   214120    0.093    0.000    0.177    0.000 posixpath.py:39(_get_sep)
        1    0.000    0.000    0.000    0.000 posixpath.py:485(commonpath)
        1    0.000    0.000    0.000    0.000 posixpath.py:500(<listcomp>)
        2    0.000    0.000    0.000    0.000 posixpath.py:503(<genexpr>)
        1    0.000    0.000    0.000    0.000 posixpath.py:507(<listcomp>)
        4    0.000    0.000    0.000    0.000 posixpath.py:62(isabs)
   117026    0.352    0.000    0.553    0.000 posixpath.py:73(join)
        1    0.000    0.000   45.171   45.171 profilebl.py:5(myfunc)
      151    0.000    0.000    0.011    0.000 re.py:179(search)
       17    0.000    0.000    0.000    0.000 re.py:204(split)
       19    0.000    0.000    0.006    0.000 re.py:231(compile)
      187    0.001    0.000    0.016    0.000 re.py:286(_compile)
       56    0.000    0.000    0.001    0.000 sre_compile.py:223(_compile_charset)
       56    0.001    0.000    0.001    0.000 sre_compile.py:250(_optimize_charset)
       35    0.000    0.000    0.000    0.000 sre_compile.py:376(_mk_bitmap)
       35    0.000    0.000    0.000    0.000 sre_compile.py:378(<listcomp>)
       40    0.000    0.000    0.000    0.000 sre_compile.py:388(_simple)
    28/23    0.000    0.000    0.000    0.000 sre_compile.py:414(_get_literal_prefix)
       23    0.000    0.000    0.000    0.000 sre_compile.py:441(_get_charset_prefix)
       27    0.000    0.000    0.001    0.000 sre_compile.py:482(_compile_info)
       54    0.000    0.000    0.000    0.000 sre_compile.py:539(isstring)
       27    0.000    0.000    0.007    0.000 sre_compile.py:542(_code)
       27    0.000    0.000    0.015    0.001 sre_compile.py:557(compile)
   232/27    0.002    0.000    0.006    0.000 sre_compile.py:64(_compile)
      232    0.000    0.000    0.000    0.000 sre_parse.py:111(__init__)
      167    0.000    0.000    0.000    0.000 sre_parse.py:159(__len__)
     1234    0.001    0.000    0.001    0.000 sre_parse.py:163(__getitem__)
       40    0.000    0.000    0.000    0.000 sre_parse.py:167(__setitem__)
      922    0.000    0.000    0.001    0.000 sre_parse.py:171(append)
   286/98    0.001    0.000    0.001    0.000 sre_parse.py:173(getwidth)
       27    0.000    0.000    0.000    0.000 sre_parse.py:223(__init__)
     1365    0.001    0.000    0.001    0.000 sre_parse.py:232(__next)
      433    0.000    0.000    0.000    0.000 sre_parse.py:248(match)
     1126    0.001    0.000    0.001    0.000 sre_parse.py:253(get)
      192    0.000    0.000    0.000    0.000 sre_parse.py:285(tell)
       15    0.000    0.000    0.000    0.000 sre_parse.py:294(_class_escape)
       15    0.000    0.000    0.000    0.000 sre_parse.py:342(_escape)
    72/27    0.000    0.000    0.007    0.000 sre_parse.py:407(_parse_sub)
   182/27    0.002    0.000    0.007    0.000 sre_parse.py:470(_parse)
       27    0.000    0.000    0.000    0.000 sre_parse.py:76(__init__)
      116    0.000    0.000    0.000    0.000 sre_parse.py:81(groups)
       27    0.000    0.000    0.000    0.000 sre_parse.py:828(fix_flags)
       31    0.000    0.000    0.000    0.000 sre_parse.py:84(opengroup)
       27    0.000    0.000    0.008    0.000 sre_parse.py:844(parse)
       31    0.000    0.000    0.001    0.000 sre_parse.py:96(closegroup)
        1    0.000    0.000    0.008    0.008 tokenize.py:355(detect_encoding)
        1    0.000    0.000    0.008    0.008 tokenize.py:379(read_or_stop)
        1    0.000    0.000    0.000    0.000 tokenize.py:385(find_cookie)
        1    0.000    0.000    0.009    0.009 tokenize.py:448(open)
       49    0.000    0.000    0.000    0.000 utils.py:11(<lambda>)
       17    0.000    0.000    0.000    0.000 utils.py:13(alphanum_key)
       17    0.000    0.000    0.000    0.000 utils.py:18(<listcomp>)
    32372    0.018    0.000    0.028    0.000 utils.py:34(listify)
        1    0.000    0.000    0.001    0.001 utils.py:7(natural_sort)
        1    0.000    0.000    0.012    0.012 warnings.py:106(_formatwarnmsg)
        1    0.000    0.000    0.012    0.012 warnings.py:20(_showwarnmsg_impl)
        1    0.000    0.000    0.012    0.012 warnings.py:35(_formatwarnmsg_impl)
        1    0.000    0.000    0.000    0.000 warnings.py:398(__init__)
        1    0.000    0.000    0.012    0.012 warnings.py:85(_showwarnmsg)
    38045    0.043    0.000    0.043    0.000 {built-in method __new__ of type object at 0x7f2be1561180}
        4    0.000    0.000    0.000    0.000 {built-in method _codecs.utf_8_decode}
        1    0.000    0.000    0.000    0.000 {built-in method _locale.nl_langinfo}
       27    0.000    0.000    0.000    0.000 {built-in method _sre.compile}
       90    0.000    0.000    0.000    0.000 {built-in method _sre.getlower}
    39007    0.016    0.000    0.016    0.000 {built-in method _stat.S_ISDIR}
        1    0.000    0.000    0.012    0.012 {built-in method _warnings.warn}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.eval}
        1    0.000    0.000   45.171   45.171 {built-in method builtins.exec}
       38    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
   287364    0.108    0.000    0.108    0.000 {built-in method builtins.isinstance}
66492/66439    0.015    0.000    0.015    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.locals}
      112    0.000    0.000    0.000    0.000 {built-in method builtins.max}
      575    0.000    0.000    0.000    0.000 {built-in method builtins.min}
      898    0.000    0.000    0.000    0.000 {built-in method builtins.ord}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
       26    0.000    0.000    0.000    0.000 {built-in method builtins.setattr}
        1    0.000    0.000    0.001    0.001 {built-in method builtins.sorted}
        2    0.001    0.000    0.001    0.000 {built-in method io.open}
   214128    0.066    0.000    0.066    0.000 {built-in method posix.fspath}
        2    0.000    0.000    0.000    0.000 {built-in method posix.getcwd}
     6645    1.889    0.000    1.889    0.000 {built-in method posix.listdir}
    45656   33.097    0.001   33.097    0.001 {built-in method posix.stat}
        2    0.000    0.000    0.000    0.000 {built-in method time.time}
    69082    0.011    0.000    0.011    0.000 {method 'append' of 'list' objects}
       19    0.000    0.000    0.000    0.000 {method 'copy' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'decode' of 'bytes' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 {method 'end' of '_sre.SRE_Match' objects}
    78078    0.024    0.000    0.024    0.000 {method 'endswith' of 'str' objects}
       40    0.000    0.000    0.000    0.000 {method 'extend' of 'list' objects}
      236    0.000    0.000    0.000    0.000 {method 'find' of 'bytearray' objects}
        1    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
     6722    0.006    0.000    0.006    0.000 {method 'get' of 'dict' objects}
    38103    0.024    0.000    0.024    0.000 {method 'group' of '_sre.SRE_Match' objects}
    39007    0.023    0.000    0.023    0.000 {method 'insert' of 'list' objects}
       49    0.000    0.000    0.000    0.000 {method 'isdigit' of 'str' objects}
    98009    0.014    0.000    0.014    0.000 {method 'items' of 'dict' objects}
       42    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'keys' of 'collections.OrderedDict' objects}
       73    0.000    0.000    0.000    0.000 {method 'keys' of 'dict' objects}
      142    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
        4    0.000    0.000    0.000    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
        1    0.001    0.001    0.001    0.001 {method 'read' of '_io.TextIOWrapper' objects}
        1    0.008    0.008    0.008    0.008 {method 'readline' of '_io.BufferedReader' objects}
        1    0.000    0.000    0.000    0.000 {method 'readlines' of '_io._IOBase' objects}
        2    0.000    0.000    0.000    0.000 {method 'replace' of 'str' objects}
    97090    0.063    0.000    0.063    0.000 {method 'rfind' of 'str' objects}
    64727    0.033    0.000    0.033    0.000 {method 'rstrip' of 'str' objects}
   615048    6.233    0.000    6.233    0.000 {method 'search' of '_sre.SRE_Pattern' objects}
        1    0.000    0.000    0.000    0.000 {method 'seek' of '_io.BufferedReader' objects}
       17    0.000    0.000    0.000    0.000 {method 'split' of '_sre.SRE_Pattern' objects}
       78    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'startswith' of 'bytes' objects}
   117042    0.055    0.000    0.055    0.000 {method 'startswith' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'strip' of 'str' objects}
       35    0.000    0.000    0.000    0.000 {method 'translate' of 'bytearray' objects}
        2    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
       39    0.000    0.000    0.000    0.000 {method 'upper' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'values' of 'collections.OrderedDict' objects}
    32363    0.109    0.000    0.109    0.000 {method 'values' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'write' of '_io.TextIOWrapper' objects}

@adelavega
Copy link
Collaborator

As a short term fix, it looks like you could use the exclude argument with regex to exclude things you don't want:
https://github.com/grabbles/grabbit/blob/585af5dda23d6908e457b77a4e2190d380e466c5/grabbit/core.py#L490

I'm not sure if this will work at the directory level (which would speed things up even more), but it's worth a try.

Mind giving that a try on the large dataset excluding all but one subject?

FYI this is sort of a "hidden" feature right now, since exclude is a kwarg for grabbit, and not in the official pybids API. include cannot be passed down to grabbit, since pybids handles that in a more domain-specific way.

@gkiar
Copy link
Contributor Author

gkiar commented Nov 15, 2018

OK, so you're saying I should trying something like:

bl = BIDSLayout('/project/6008063/gkiar/data/smallRS', exclude='^sub-(?!A00028185)(.*)$')

Where A00028185 is my subject id. Is that correct?

@adelavega
Copy link
Collaborator

Sure! I trust your regex skillz

@gkiar
Copy link
Contributor Author

gkiar commented Nov 16, 2018

Heyo - so I verified that the regex picked up only what I want (https://regex101.com/r/uOKFHL/1), but the load time didn't change for any scale. Any other ideas?

@adelavega
Copy link
Collaborator

But when you do layout.get_subjects are only the subjects you care about listed? When I tried it, your regex didn't work, but after removing the ^ it did (not sure why)

Maybe try: sub-(?!A00018030|A00066396)(.*)$?
And if that doesn't work then it's still looping over all files, but not indexing them, which is a problem.

@gkiar
Copy link
Contributor Author

gkiar commented Nov 16, 2018

Ah, thanks for fixing the regex! It's because it is looking in provided directory, so sub- isn't at the start of the line. Your method worked to reduce the time - the 900 subject list took 1.4 seconds now.

I'll use this for the time being and as I play around I'll let you know if I notice any peculiarities. Thanks! 👍

@gkiar gkiar closed this as completed Nov 16, 2018
@adelavega
Copy link
Collaborator

Awesome. I'm going to reopen this issue (with a more general name) because I think this is a all too common scenario, and coming up w/ said regex is obvious not that intuitive.

I think officially supporting excluding subjects at the BIDSLayout level in the API could be useful for many.

And like Tal said, the supralinear time increase w/ dataset size is worrisome...

@adelavega adelavega reopened this Nov 16, 2018
@adelavega adelavega changed the title BIDSLayout gets slow with big datasets Improve BIDSLayout performance on very large datasets Nov 16, 2018
@tyarkoni
Copy link
Collaborator

Looks like most of the time is being eaten up by os.stat (via os.path.isdir), which is pretty surprising. Not sure how much we'll be able to do about this, but I'll look into it.

@yarikoptic
Copy link
Collaborator

FWIW -- if that is the same directories over and over again, might be worth creating/using a little simple "memoizer" for isdir for the duration of the call so previous result is reused/returned upon subsequent calls querying the same directory.

@effigies
Copy link
Collaborator

It might also be faster to assume that things that should be directories are, catch exceptions when trying to open/stat something underneath, and perform the isdir check in the except block if you really need to diagnose it.

@tyarkoni
Copy link
Collaborator

@yarikoptic good idea. I'll have to take a closer look to see where the isdir calls are coming from... the metadata indexing in pybids is already memoized, but the file indexing in grabbit isn't (I think).

I'm pretty sure we do need to determine whether each path is a file or directory, because different validation hooks get triggered (_validate_dir vs. _validate_file), and I'm not sure they're guaranteed to fail if the wrong type is passed.

@gkiar
Copy link
Contributor Author

gkiar commented Feb 14, 2019

Update: using branch for 0.8: problem seems to be fixed!

$ python profilebl.py
963:41.128883838653564
         5455536 function calls (5396092 primitive calls) in 41.147 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   41.147   41.147 <string>:1(<module>)
    56828    0.165    0.000    2.157    0.000 __init__.py:274(load)
    56828    0.095    0.000    1.475    0.000 __init__.py:302(loads)
    56827    0.064    0.000    0.203    0.000 _bootlocale.py:23(getpreferredencoding)
     8118    0.041    0.000    3.065    0.000 bids_validator.py:101(is_session_level)
     8118    0.024    0.000    0.216    0.000 bids_validator.py:106(<listcomp>)
     8118    0.030    0.000    2.667    0.000 bids_validator.py:110(is_subject_level)
     8118    0.010    0.000    0.038    0.000 bids_validator.py:115(<listcomp>)
     8118    0.027    0.000    2.617    0.000 bids_validator.py:121(is_phenotypic)
     8118    0.010    0.000    0.034    0.000 bids_validator.py:126(<listcomp>)
     8118    0.060    0.000    3.695    0.000 bids_validator.py:130(is_file)
     8118    0.056    0.000    0.293    0.000 bids_validator.py:135(<listcomp>)
    48708    0.857    0.000   16.785    0.000 bids_validator.py:139(get_regular_expressions)
    56826    0.048    0.000    0.192    0.000 bids_validator.py:179(conditional_match)
        1    0.000    0.000    0.000    0.000 bids_validator.py:35(__init__)
     8118    0.075    0.000   20.621    0.003 bids_validator.py:39(is_bids)
     8118    0.146    0.000    5.827    0.001 bids_validator.py:72(is_top_level)
     8118    0.039    0.000    0.192    0.000 bids_validator.py:82(<listcomp>)
     8118    0.033    0.000    2.668    0.000 bids_validator.py:89(is_associated_data)
     8118    0.010    0.000    0.038    0.000 bids_validator.py:97(<listcomp>)
    56828    0.026    0.000    0.026    0.000 codecs.py:259(__init__)
    56828    0.056    0.000    0.082    0.000 codecs.py:308(__init__)
    56828    0.092    0.000    0.155    0.000 codecs.py:318(decode)
        1    0.000    0.000    0.000    0.000 config.py:33(get_option)
   101680    0.074    0.000    0.370    0.000 core.py:123(match_file)
    23792    0.011    0.000    0.011    0.000 core.py:140(add_file)
   101680    0.013    0.000    0.013    0.000 core.py:158(_astype)
     6355    0.019    0.000    0.075    0.000 core.py:172(__init__)
     6355    0.002    0.000    0.002    0.000 core.py:180(_matches)
   6528/1    0.059    0.000   41.077   41.077 core.py:338(__init__)
        1    0.000    0.000    0.004    0.004 core.py:38(__init__)
     6528    0.014    0.000    0.026    0.000 core.py:381(_update_entities)
     6528    0.012    0.000    0.028    0.000 core.py:387(_extract_entities)
     6527    0.022    0.000    0.048    0.000 core.py:394(_get_child_class)
     4253    0.001    0.000    0.001    0.000 core.py:422(_setup)
     6528    0.007    0.000    0.025    0.000 core.py:425(abs_path)
     8803    0.003    0.000    0.003    0.000 core.py:429(root_path)
89091/45522    0.041    0.000    0.041    0.000 core.py:433(layout)
   6528/1    0.195    0.000   41.077   41.077 core.py:437(index)
        1    0.000    0.000    0.005    0.005 core.py:48(load)
     1311    0.001    0.000    0.001    0.000 core.py:523(_setup)
      963    0.002    0.000    0.002    0.000 core.py:534(_setup)
      963    0.000    0.000    0.000    0.000 core.py:535(<listcomp>)
        1    0.000    0.000   41.077   41.077 core.py:546(__init__)
        1    0.000    0.000    0.000    0.000 core.py:551(_setup)
        1    0.000    0.000    0.000    0.000 core.py:552(<dictcomp>)
       16    0.000    0.000    0.004    0.000 core.py:64(__init__)
    56828    0.120    0.000    1.343    0.000 decoder.py:334(decode)
    56828    1.074    0.000    1.074    0.000 decoder.py:345(raw_decode)
      772    0.000    0.000    0.001    0.000 enum.py:265(__call__)
      772    0.000    0.000    0.000    0.000 enum.py:515(__new__)
       15    0.000    0.000    0.000    0.000 enum.py:795(__or__)
      371    0.001    0.000    0.001    0.000 enum.py:801(__and__)
     6531    0.017    0.000   15.103    0.002 genericpath.py:16(exists)
     8118    0.023    0.000    0.039    0.000 genericpath.py:69(commonprefix)
        4    0.000    0.000    0.000    0.000 inflect.py:1659(get_si_pron)
        1    0.000    0.000    0.000    0.000 inflect.py:1916(__init__)
        2    0.000    0.000    0.000    0.000 inflect.py:2024(ud_match)
        1    0.000    0.000    0.000    0.000 inflect.py:2201(postprocess)
        1    0.000    0.000    0.000    0.000 inflect.py:2219(partition_word)
        1    0.000    0.000    0.003    0.003 inflect.py:2365(singular_noun)
        2    0.000    0.000    0.000    0.000 inflect.py:2461(get_count)
        2    0.000    0.000    0.002    0.001 inflect.py:2921(_sinoun)
        1    0.004    0.004   41.129   41.129 layout.py:150(__init__)
        1    0.000    0.000    0.000    0.000 layout.py:167(<listcomp>)
        1    0.000    0.000    0.000    0.000 layout.py:170(<listcomp>)
        1    0.000    0.000    0.005    0.005 layout.py:183(<listcomp>)
        1    0.000    0.000    0.000    0.000 layout.py:184(<dictcomp>)
        1    0.000    0.000    0.040    0.040 layout.py:202(_validate_root)
        1    0.000    0.000    0.000    0.000 layout.py:237(_setup_file_validator)
     6527    0.005    0.000    0.063    0.000 layout.py:248(_validate_dir)
     8118    0.037    0.000   21.137    0.003 layout.py:251(_validate_file)
        1    0.000    0.000    0.000    0.000 layout.py:275(_get_layouts_in_scope)
        1    0.000    0.000    0.003    0.003 layout.py:287(__getattr__)
        1    0.003    0.003    0.015    0.015 layout.py:434(get)
        1    0.001    0.001    0.001    0.001 layout.py:539(<listcomp>)
        1    0.001    0.001    0.001    0.001 layout.py:542(<listcomp>)
        1    0.000    0.000    0.000    0.000 layout.py:987(__init__)
    13056    0.061    0.000    3.709    0.000 os.py:277(walk)
     6355    0.014    0.000    0.029    0.000 posixpath.py:142(basename)
     6356    0.017    0.000    0.028    0.000 posixpath.py:152(dirname)
    45527    0.186    0.000    0.297    0.000 posixpath.py:329(normpath)
    45527    0.046    0.000    0.420    0.000 posixpath.py:367(abspath)
   108706    0.034    0.000    0.062    0.000 posixpath.py:39(_get_sep)
     8118    0.050    0.000    0.278    0.000 posixpath.py:444(relpath)
     8118    0.006    0.000    0.006    0.000 posixpath.py:466(<listcomp>)
     8118    0.005    0.000    0.005    0.000 posixpath.py:467(<listcomp>)
    45527    0.030    0.000    0.071    0.000 posixpath.py:62(isabs)
    50468    0.121    0.000    0.189    0.000 posixpath.py:73(join)
        1    0.000    0.000   41.147   41.147 profilebl.py:5(myfunc)
     2275    0.001    0.000    0.007    0.000 re.py:169(match)
        3    0.000    0.000    0.002    0.001 re.py:179(search)
      963    0.000    0.000    0.001    0.000 re.py:204(split)
     5860    0.004    0.000    0.024    0.000 re.py:214(findall)
   251674    0.077    0.000    0.312    0.000 re.py:231(compile)
   260775    0.211    0.000    0.246    0.000 re.py:286(_compile)
      210    0.000    0.000    0.004    0.000 sre_compile.py:223(_compile_charset)
      210    0.001    0.000    0.003    0.000 sre_compile.py:250(_optimize_charset)
      164    0.000    0.000    0.001    0.000 sre_compile.py:376(_mk_bitmap)
      164    0.001    0.000    0.001    0.000 sre_compile.py:378(<listcomp>)
      334    0.001    0.000    0.001    0.000 sre_compile.py:388(_simple)
        3    0.000    0.000    0.000    0.000 sre_compile.py:393(_generate_overlap_table)
    56/52    0.000    0.000    0.000    0.000 sre_compile.py:414(_get_literal_prefix)
       49    0.000    0.000    0.000    0.000 sre_compile.py:441(_get_charset_prefix)
       53    0.000    0.000    0.003    0.000 sre_compile.py:482(_compile_info)
      106    0.000    0.000    0.000    0.000 sre_compile.py:539(isstring)
       53    0.000    0.000    0.015    0.000 sre_compile.py:542(_code)
       53    0.000    0.000    0.035    0.001 sre_compile.py:557(compile)
   909/53    0.005    0.000    0.012    0.000 sre_compile.py:64(_compile)
       39    0.000    0.000    0.000    0.000 sre_parse.py:101(checklookbehindgroup)
      909    0.000    0.000    0.000    0.000 sre_parse.py:111(__init__)
     1332    0.000    0.000    0.000    0.000 sre_parse.py:159(__len__)
      141    0.000    0.000    0.000    0.000 sre_parse.py:161(__delitem__)
     5633    0.002    0.000    0.003    0.000 sre_parse.py:163(__getitem__)
      334    0.000    0.000    0.000    0.000 sre_parse.py:167(__setitem__)
     3819    0.001    0.000    0.001    0.000 sre_parse.py:171(append)
 1329/495    0.003    0.000    0.003    0.000 sre_parse.py:173(getwidth)
       53    0.000    0.000    0.000    0.000 sre_parse.py:223(__init__)
     6513    0.003    0.000    0.003    0.000 sre_parse.py:232(__next)
     2090    0.001    0.000    0.001    0.000 sre_parse.py:248(match)
     5370    0.002    0.000    0.004    0.000 sre_parse.py:253(get)
     1114    0.000    0.000    0.000    0.000 sre_parse.py:285(tell)
       20    0.000    0.000    0.000    0.000 sre_parse.py:294(_class_escape)
      245    0.000    0.000    0.001    0.000 sre_parse.py:342(_escape)
   321/53    0.001    0.000    0.019    0.000 sre_parse.py:407(_parse_sub)
   527/53    0.007    0.000    0.019    0.000 sre_parse.py:470(_parse)
       53    0.000    0.000    0.000    0.000 sre_parse.py:76(__init__)
      400    0.000    0.000    0.000    0.000 sre_parse.py:81(groups)
       53    0.000    0.000    0.000    0.000 sre_parse.py:828(fix_flags)
      108    0.000    0.000    0.000    0.000 sre_parse.py:84(opengroup)
       53    0.000    0.000    0.019    0.000 sre_parse.py:844(parse)
      108    0.000    0.000    0.001    0.000 sre_parse.py:96(closegroup)
       39    0.000    0.000    0.000    0.000 sre_parse.py:98(checkgroup)
        1    0.000    0.000    0.006    0.006 utils.py:29(natural_sort)
     2889    0.001    0.000    0.002    0.000 utils.py:33(<lambda>)
      963    0.001    0.000    0.005    0.000 utils.py:35(alphanum_key)
      963    0.001    0.000    0.003    0.000 utils.py:40(<listcomp>)
    11081    0.009    0.000    0.012    0.000 utils.py:6(listify)
    29290    0.020    0.000    0.317    0.000 utils.py:91(check_path_matches_patterns)
    56828    0.063    0.000    0.063    0.000 {built-in method _codecs.utf_8_decode}
    56827    0.139    0.000    0.139    0.000 {built-in method _locale.nl_langinfo}
       53    0.000    0.000    0.000    0.000 {built-in method _sre.compile}
       68    0.000    0.000    0.000    0.000 {built-in method _sre.getlower}
    56826    0.016    0.000    0.016    0.000 {built-in method builtins.any}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.eval}
        1    0.000    0.000   41.147   41.147 {built-in method builtins.exec}
   245643    0.061    0.000    0.061    0.000 {built-in method builtins.isinstance}
143606/143221    0.019    0.000    0.020    0.000 {built-in method builtins.len}
     8361    0.004    0.000    0.004    0.000 {built-in method builtins.max}
    10196    0.011    0.000    0.011    0.000 {built-in method builtins.min}
    21174    2.669    0.000    2.669    0.000 {built-in method builtins.next}
     4050    0.000    0.000    0.000    0.000 {built-in method builtins.ord}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.001    0.001    0.006    0.006 {built-in method builtins.sorted}
    56828   15.770    0.000   16.054    0.000 {built-in method io.open}
   222524    0.036    0.000    0.036    0.000 {built-in method posix.fspath}
     6528    0.969    0.000    0.969    0.000 {built-in method posix.scandir}
     6531   15.085    0.002   15.085    0.002 {built-in method posix.stat}
        2    0.000    0.000    0.000    0.000 {built-in method time.time}
   719070    0.072    0.000    0.072    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'capitalize' of 'str' objects}
        2    0.000    0.000    0.000    0.000 {method 'copy' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   113656    0.018    0.000    0.018    0.000 {method 'end' of '_sre.SRE_Match' objects}
    58818    0.012    0.000    0.012    0.000 {method 'endswith' of 'str' objects}
      175    0.000    0.000    0.000    0.000 {method 'extend' of 'list' objects}
      962    0.000    0.000    0.000    0.000 {method 'find' of 'bytearray' objects}
    62686    0.085    0.000    0.085    0.000 {method 'findall' of '_sre.SRE_Pattern' objects}
        2    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
      573    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
    23795    0.007    0.000    0.007    0.000 {method 'group' of '_sre.SRE_Match' objects}
    14646    0.005    0.000    0.005    0.000 {method 'is_dir' of 'posix.DirEntry' objects}
     2889    0.000    0.000    0.000    0.000 {method 'isdigit' of 'str' objects}
    12774    0.002    0.000    0.002    0.000 {method 'items' of 'dict' objects}
   337778    0.100    0.000    0.100    0.000 {method 'join' of 'str' objects}
    48714    0.013    0.000    0.013    0.000 {method 'keys' of 'dict' objects}
     1936    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
   115931    0.123    0.000    0.123    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
    56828    0.362    0.000    0.517    0.000 {method 'read' of '_io.TextIOWrapper' objects}
   298110    0.210    0.000    0.210    0.000 {method 'replace' of 'str' objects}
    12711    0.008    0.000    0.008    0.000 {method 'rfind' of 'str' objects}
     6356    0.004    0.000    0.004    0.000 {method 'rstrip' of 'str' objects}
   296515    0.506    0.000    0.506    0.000 {method 'search' of '_sre.SRE_Pattern' objects}
      963    0.000    0.000    0.000    0.000 {method 'split' of '_sre.SRE_Pattern' objects}
    61771    0.043    0.000    0.043    0.000 {method 'split' of 'str' objects}
   265284    0.065    0.000    0.065    0.000 {method 'startswith' of 'str' objects}
      164    0.000    0.000    0.000    0.000 {method 'translate' of 'bytearray' objects}
    13058    0.015    0.000    0.015    0.000 {method 'update' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'upper' of 'str' objects}
     6357    0.001    0.000    0.001    0.000 {method 'values' of 'dict' objects}

@effigies
Copy link
Collaborator

By my read that shaved off ~4s/45s. Or is that a different dataset than your last profile?

@gkiar
Copy link
Contributor Author

gkiar commented Feb 14, 2019

Different dataset - sorry. Look at the initial comment. It turned 631 seconds into 41

@effigies
Copy link
Collaborator

Nice.

@tyarkoni
Copy link
Collaborator

Sweet, thanks Greg!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants