Skip to content

Add arro3 to package list #5020

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Sep 4, 2024
Merged

Conversation

kylebarron
Copy link
Contributor

@kylebarron kylebarron commented Aug 20, 2024

Description

arro3 is a new Python library for working with Apache Arrow data. See documentation.

arro3 is distributed with namespace packaging. This means that users who only need to represent Arrow data (say as interop between Arrow-based Python packages) only need the core module. So this PR creates three new recipes: arro3-core, arro3-io, and arro3-compute.

cc @hoodmane who suggested contributing this.

I'm expecting to make an arro3 0.3 release in the next few days and will bump these versions to 0.3 when that's released. (It can either be this or a follow up PR; I saw the 0.27.0a1 tag was published and figured I should make this PR sooner than later.)

Checklists

  • Add a CHANGELOG entry
  • Add / update tests
  • Add new / update outdated documentation

@hoodmane
Copy link
Member

It'll be a few more weeks at least before we release 0.27 I think so no huge rush.

@hoodmane
Copy link
Member

Thanks @kylebarron!

@agriyakhetarpal
Copy link
Member

Not a blocker, but it would be great if you could add licensing fields to the recipes too.

@kylebarron
Copy link
Contributor Author

I added license fields and updated to arro3 0.3.0.

Instead of just returning `["a"]` when we find `import a.b.c`, return
`["a", "a.b", "a.b.c"]`
name: arro3-compute
version: 0.3.0
top-level:
- "arro3.compute"
Copy link
Member

@hoodmane hoodmane Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is failing because top-level isn't supposed to contain a dot. Either these should be top-level: [arro3] or we need to fix find_imports. @ryanking13 perhaps it would make sense to change find_imports to support this:

+def _add_prefixes(s: set[str], mod: str) -> None:
+    [current, *rest] = mod.split(".")
+    s.add(current)
+    for part in rest:
+        current += f".{part}"
+        s.add(current)
+
+
 def find_imports(source: str) -> list[str]:
    source = dedent(source)
    try:
         mod = ast.parse(source)
     except SyntaxError:
         return []
-    imports = set()
+    imports: set[str] = set()
     for node in ast.walk(mod):
         if isinstance(node, ast.Import):
             for name in node.names:
                 node_name = name.name
-                imports.add(node_name.split(".")[0])
+                _add_prefixes(imports, node_name)
         elif isinstance(node, ast.ImportFrom):
             module_name = node.module
             if module_name is None:
                 continue
-            imports.add(module_name.split(".")[0])
+            _add_prefixes(imports, module_name)
     return list(sorted(imports))

Copy link
Contributor Author

@kylebarron kylebarron Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like namespace packaging but it does tend to break tools 😅

Either these should be top-level: [arro3]

For a namespace module arro3.core, import arro3 will do nothing because there's no __init__.py file in the arro3 directory. It's not until you reach arro3/core/__init__.py that there's anything to import.

Copy link
Member

@ryanking13 ryanking13 Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it makes sense to change find_imports. Changing top-level: [arro3] would result in importing arro3 loading all three arro3 subpackages, which might not be the intended behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing top-level: [arro3] would result in importing arro3 loading all three arro3 subpackages

No, with namespace packaging, calling import arro3 does nothing. No code is loaded until importing one of the submodules:

In [1]: import arro3

In [2]: arro3.core.Array
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[2], line 1
----> 1 arro3.core.Array

AttributeError: module 'arro3' has no attribute 'core'

In [3]: import arro3.core

In [4]: arro3.core.Array
Out[4]: arro3.core._core.Array

@hoodmane
Copy link
Member

Looks good @kylebarron, please add a changelog entry and then we will merge.

@kylebarron
Copy link
Contributor Author

kylebarron commented Sep 3, 2024

Added to changelog!

Copy link
Member

@agriyakhetarpal agriyakhetarpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @kylebarron!

@agriyakhetarpal agriyakhetarpal merged commit b70201b into pyodide:main Sep 4, 2024
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants