Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev/depman gpufeat #517

Open
wants to merge 410 commits into
base: master
Choose a base branch
from
Open

Dev/depman gpufeat #517

wants to merge 410 commits into from

Conversation

dcolinmorgan
Copy link
Contributor

to test gpu-featurization (cu_cat) with new dependency manager

dcolinmorgan and others added 2 commits May 23, 2024 12:14
@@ -422,7 +422,10 @@ def infer_self_graph(res,
assert (
emb.shape[0] == df.shape[0]
), "minibatches emb and X must have same number of rows since h(df) = emb"
df = df.assign(x=emb.x, y=emb.y) # add x and y to df for graphistry instance
try:
df = df.assign(x=emb.x, y=emb.y) # add x and y to df for graphistry instance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of try/catch, can we do value inspection?

try:
dist = np.linalg.norm(diff, axis=1) # Euclidean distance
except TypeError:
dist = np.linalg.norm(diff.to_pandas(), axis=1) # Euclidean distance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, instead of try/catch, can we do value inspection?

Copy link
Contributor Author

@dcolinmorgan dcolinmorgan Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmeyerov I will back this out everywhere (not too pervasive in overhaul yet, 10 files including tests), but its pretty simple, works well, and gets rid of instances like in embed_utils where you end up using the lazy imports almost comically throughout, so i can put this into a seperate PR, only to be reunited one day far off

_, torch, _, _, _, _, _, _ = lazy_embed_import_dep()
_, _, _, dgl, _, _, _, _ = lazy_embed_import_dep()

def __init__(self):
self.pkgs = {}

def __getattr__(self, pkg:str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead call this def import(...) or something to make it explicit that code is running

@@ -0,0 +1,30 @@
import importlib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from importlib import __import__, import_module ?

@@ -0,0 +1,30 @@
import importlib

class DepManager:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move dep_manager.py to utils/dep_manager.py


def import_from(self, pkg:str, name:str):
try:
module = __import__(pkg, fromlist=[name])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importlib docs recommend using importlib.import_module instead

@@ -0,0 +1,30 @@
import importlib

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class benefits from a comment on its design and the problem it solves

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ex: maybe it does a fast & cached check of a package being in the path before actually importing, so there should be a testable and observable speedup?

try:
return self.pkgs[pkg]
except KeyError:
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of hiding the exception from the user, it's probably more informative and unsurprising to:

  • cache the exception during _add_deps
  • fetch the cached exception, instead of None, and rethrow here, instead of return

@lmeyerov
Copy link
Contributor

Plan is we'll split this into 2 PRs, one on depman (see comments) and a separate on featurE_uitls fixes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants