Skip to content

DISC: Directory for code "upstream" from _libs #25162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Feb 5, 2019 · 2 comments
Closed

DISC: Directory for code "upstream" from _libs #25162

jbrockmendel opened this issue Feb 5, 2019 · 2 comments
Labels
Refactor Internal refactoring of code

Comments

@jbrockmendel
Copy link
Member

Motivated in part by #24737, which is discussing vendoring tzlocal for use in tslibs.

A lot of effort has gone into making _libs, and in particular tslibs, self-contained within the pandas code base. This makes these parts of the code base much easier to reason about. It also makes it easier to test, profile, and measure coverage on pieces of the code base in isolation.

Furthermore, outside of _libs the dependency structure is not quite DAG, but bears a resemblance if you squint and tilt your head a little: compat, errors, and util._* are almost independent of core. core.dtypes is almost-independent of the rest of core, etc.

AFAICT there are two pieces of code that prevent us from further simplifying the dependency structure: config.get_option and pprint_thing.

The proposal is to put these two functions in a directory that is explicitly "upstream" of everything else, allowing us to simplify the dependency structure.

[Side-note: It might also make sense to make a config shared within the ecosystem to de-duplicate e.g. pd.set_option('display.width', ...) and np.set_printoption(...)]

Thoughts?

@gfyoung gfyoung added the Refactor Internal refactoring of code label Feb 7, 2019
@gfyoung
Copy link
Member

gfyoung commented Feb 7, 2019

Making the codebase DAG-like is not necessarily the same as separating out pieces of the codebase from one another. That being said if DAG and increased modularity happen to go hand-in-hand, by all means.

Also, is _libs actually a DAG now?

@jbrockmendel
Copy link
Member Author

Making the codebase DAG-like is not necessarily the same as separating out pieces of the codebase from one another.

True. In the case of core.config, if we want to avoid runtime imports in _libs (and I for one do), that does need to be moved outside of core.

Also, is _libs actually a DAG now?

No, but a subset of _libs modules satisfy the criterion, and I think it is worthwhile trying to gradually expand that subset.

  • tslibs has no outside _libs dependencies, does have runtime imports for to_offset, pd.get_option, and tm.set_locale
    • to_offset is eventually going to wind up directly in tslibs. Getting there will take a while.
    • get_option, as discussed, I'd like to make upstream.
    • tm is a weird place for set_locale. It could easily go in a zero-dependency module. If get_option were made upstream, then a bunch of locale-specific functions could go with set_locale.
    • within tslibs there are some unavoidable dependencies since Timestamp and Timedelta require each other. Other than that, a lot of effort has gone into making it as DAG-like as possible.
  • tslib depends only on tslibs, eventually will likely be made part of tslibs.
  • writers, sparse, reshape, properties, indexing have zero pandas dependencies
  • khash is essentially vendored IIUC
  • skiplist depends only on a C file in _libs/src (also pseudo-vendored?)
  • window depends only on skiplist, util, and src/
  • missing depends only on tslibs and util
  • hashing depends only on util
  • hashtable depends on util, khash, and missing
  • interval depends on util, hashtable, and tslibs
  • ops depends only on util and missing

That leaves:

  • lib: has runtime imports of DatetimeIndex, construct_1d_object_array_from_listlike (otherwise just depends on tslibs, util, and missing; many functions in lib are stand-alone)
  • algos: has a runtime import from lib
  • groupby, index: imports from algos
  • internals: import from algos, runtime import of Int64Index
  • join: imports from algos and core.algorithms
  • parsers: many imports from outside of _libs, also lib
  • reduction: imports from lib
  • testing: imports from outside of _libs

Some of these (parsers, testing) aren't worth trying to make DAG. Looking at it, moving arrmap (the function with the runtime lib import) out of algos would let us add algos, groupby, and index to the DAG pile, which may be worthwhile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Refactor Internal refactoring of code
Projects
None yet
Development

No branches or pull requests

2 participants