feat: allow download of exported parquet files #459

RogerHYang · 2023-03-28T23:17:44Z

resolves #433

for GUI only, console environment will be tackled in next PR

axiomofjoy · 2023-03-28T23:48:50Z

src/phoenix/config.py

-    try:
-        path.mkdir(parents=True, exist_ok=True)
-    except OSError as e:
-        if e.errno == errno.EEXIST:
-            pass
-        else:
-            raise
-    else:
-        path.chmod(0o777)
+    path.mkdir(parents=True, exist_ok=True)


axiomofjoy · 2023-03-28T23:51:12Z

src/phoenix/server/api/types/Model.py

+        return [
+            ExportedFile(
+                file_name=path.stem,
+                directory=str(EXPORT_DIR),
+            )
+            for path in await loop.run_in_executor(
+                None,
+                get_exported_files,
+                n_latest,
+            )
+        ]


What does the async list comprehension do here?

It's just running the I/O operation (listing files) in a separate thread so it's not blocking the event loop. The comprehension itself is not async.

So the idea is that the call to get_exported_files does not block in case there are a large number of exported files?

yes. probably doesn't matter in reality. i think it's just good practice (for I/O operations in general)

Got it. I'm curious if there's a way of accomplishing this without explicitly invoking the event loop. It would be possible to make get_exported_files a co-routine, for example. Might be cleaner.

yes, but not in 3.8 haha

new in 3.9 https://docs.python.org/3/library/asyncio-task.html#asyncio.to_thread

i thought about making get_exported_files a coroutine too but then i realized i need to call it in Jupyter notebook for session so it would be inconvenient that way

axiomofjoy · 2023-03-28T23:52:51Z

src/phoenix/config.py

+def get_exported_files(
+    n_latest: int = 5,
+    directory: Path = EXPORT_DIR,
+    extension: str = "parquet",
+) -> List[Path]:
+    """
+    Yields n most recently exported files by descending modification time.
+
+    Parameters
+    ----------
+    n_latest: int, optional, default=5
+        Specifies the number of the most recent exported files to return. If
+        there are fewer than n exported files then fewer than n files will
+        be returned.
+
+    Returns
+    -------
+    list: List[Path]
+        List of paths of the n most recent exported files.
+    """
+    return nlargest(
+        n_latest,
+        directory.glob("*." + extension),
+        lambda p: p.stat().st_mtime,
+    )


Not sure I would expect this function to live with the config.

true. i didn't find a better home for it

Could make sense to put it in src/phoenix/server/api/types/Model.py for now if that is the only place it is used.

yea that's where i had put it in the first place, but then i realize i may want to use this function in other places too, e.g. in session, so i need to put it somewhere higher up, like a utils folder

mikeldking · 2023-03-29T00:25:43Z

src/phoenix/server/app.py

+class Download(HTTPEndpoint):
+    async def get(self, request: Request) -> FileResponse:
+        params = QueryParams(request.query_params)
+        file = EXPORT_DIR / (params.get("filename", "") + ".parquet")
+        if not file.is_file():
+            raise HTTPException(status_code=404)
+        return FileResponse(
+            path=file,
+            filename=file.name,
+            media_type="application/x-octet-stream",
+        )


Why not just use Static - that way you don't have to do anything - just host the files and let the file MIME types self-describe themselves?

I couldn't figure out how to get Static to work, because it always gets routed to the /static subfolder (even though I had initialized a separate instance). I took this get snippet from the Starlette manual and it worked out of the box.

mikeldking · 2023-03-29T00:26:31Z

src/phoenix/config.py

+    list: List[Path]
+        List of paths of the n most recent exported files.
+    """
+    return nlargest(


Suggested change

return nlargest(

return nlatest(

this is a python built-in: heapq.nlargest

mikeldking · 2023-03-29T00:28:05Z

app/schema.graphql

+  """
+  Returns n most recent exported Parquet files sorted by descending modification time.
+  """
+  exportedFiles(nLatest: Int! = 5): [ExportedFile!]!


Just checking I'm not sure you're going to be able to execute graphQL queries from the python runtime for colab- at least I've failed to do so so far.

It might be simplest to just read from the directory? No need for networkIO? This is nice for the UI so non-blocking.

that's correct. i am working on the next PR for the console version. this is just for the GUI (e.g. in a modal)

RogerHYang added 3 commits March 28, 2023 16:16

allow download

7b1a6fc

add parquet file extension

b562690

clean up

ed2296c

axiomofjoy reviewed Mar 29, 2023

View reviewed changes

axiomofjoy approved these changes Mar 29, 2023

View reviewed changes

RogerHYang merged commit 7d3d8ee into main Mar 29, 2023

RogerHYang deleted the download-parquet branch March 29, 2023 00:25

mikeldking reviewed Mar 29, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: allow download of exported parquet files #459

feat: allow download of exported parquet files #459

RogerHYang commented Mar 28, 2023 •

edited

Loading

axiomofjoy Mar 28, 2023

axiomofjoy Mar 28, 2023

RogerHYang Mar 29, 2023 •

edited

Loading

axiomofjoy Mar 29, 2023

RogerHYang Mar 29, 2023

axiomofjoy Mar 29, 2023

RogerHYang Mar 29, 2023

RogerHYang Mar 29, 2023

RogerHYang Mar 29, 2023 •

edited

Loading

axiomofjoy Mar 28, 2023

RogerHYang Mar 29, 2023

axiomofjoy Mar 29, 2023

RogerHYang Mar 29, 2023 •

edited

Loading

mikeldking Mar 29, 2023

RogerHYang Mar 29, 2023

mikeldking Mar 29, 2023

RogerHYang Mar 29, 2023

mikeldking Mar 29, 2023

RogerHYang Mar 29, 2023

feat: allow download of exported parquet files #459

feat: allow download of exported parquet files #459

Conversation

RogerHYang commented Mar 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RogerHYang Mar 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RogerHYang Mar 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RogerHYang Mar 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RogerHYang commented Mar 28, 2023 •

edited

Loading

RogerHYang Mar 29, 2023 •

edited

Loading

RogerHYang Mar 29, 2023 •

edited

Loading

RogerHYang Mar 29, 2023 •

edited

Loading