Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Report all unsupported operations for a query in cudf.polars #16960

Draft
wants to merge 1 commit into
base: branch-24.12
Choose a base branch
from

Conversation

Matt711
Copy link
Contributor

@Matt711 Matt711 commented Oct 1, 2024

Description

Closes #16690. The purpose of this PR is to list all of the unique operations that are unsupported by cudf.polars when running a query. The current approach is to create a new node (ErrorNode) in the IR when translating polars IR to cudf.polars IR if the translation step fails with a NotImplementedError. And then traverse the new tree and report where ErrorNodes occured to the user.

  1. Question: How to traverse the tree to report the error nodes? Should this be done upstream in Polars?
  2. Instead of traversing the query afterwards, we should probably catch each unsupported feature as we translate the IR.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@Matt711 Matt711 added feature request New feature or request 5 - DO NOT MERGE Hold off on merging; see PR for details non-breaking Non-breaking change labels Oct 1, 2024
@Matt711 Matt711 self-assigned this Oct 1, 2024
@github-actions github-actions bot added Python Affects Python cuDF API. cudf.polars Issues specific to cudf.polars labels Oct 1, 2024
Comment on lines +166 to +167
def evaluate(self, *, cache: MutableMapping[int, DataFrame]) -> DataFrame:
return pl.DataFrame()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def evaluate(self, *, cache: MutableMapping[int, DataFrame]) -> DataFrame:
return pl.DataFrame()
def evaluate(self, *, cache: MutableMapping[int, DataFrame]) -> DataFrame:
return DataFrame([])

The object evaluate returns should be an internal DataFrame, rather than a polars DataFrame.

But, plausibly we don't want to define this method at all, and just let the default implementation raise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could keep track of the exception so that we can report it later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should keep track of the errors. There's two ways off the top of my head, either maintain a global list that we append to in debug mode, or, have the ErrorNode constructor accept a string error message which attaches it to the instance itself. Either way I think in debug mode we'll need to read once more over the errors and print them in some reasonable way.

Comment on lines +6 to +30
from __future__ import annotations

import os


def _env_get_int(name, default):
"""Get the integer value of the environment variable."""
try:
return int(os.getenv(name, default))
except (ValueError, TypeError):
return default


def _env_get_bool(name, default):
"""Get the the boolean value of the environment variable."""
env = os.getenv(name)
if env is None:
return default
as_a_int = _env_get_int(name, None)
env = env.lower().strip()
if env == "true" or env == "on" or as_a_int:
return True
if env == "false" or env == "off" or as_a_int == 0:
return False
return default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: in the long run we probably want to use the polars Config object for this, rather than having our own parallel configuration.

Comment on lines +34 to +35
if other._env_get_bool("CUDF_POLARS_DEBUG_MODE", default=False):
return ir.ErrorNode(args[0].get_schema())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of action-at-distance stateful modification of the environment makes me think (and @brandon-b-miller wants it too for other reasons) that we need to carry some kind of config object around in the translate_ir

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other advantage of sending the config through to translate_ir would be the ability to configure per-query rather than per-session

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - DO NOT MERGE Hold off on merging; see PR for details cudf.polars Issues specific to cudf.polars feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Report all unsupported operations for a query in cudf-polars
3 participants