-
Notifications
You must be signed in to change notification settings - Fork 368
Add fenic-datasets integration #1936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Just made the PR for the image too: https://huggingface.co/datasets/huggingface/documentation-images/discussions/548 |
I'm just discovering fenic, the API looks great :) btw is there a way to use Hugging Face Inference Providers for the semantic / generative operations ? It's a unified API for many providers serving models on HF, you can find more info at https://huggingface.co/docs/inference-providers/en/index |
Thank you so much for the kind words. It's great to hear from you that the API looks great! Regarding HF Inference Providers, we don't currently have support for it but we will definitely add it, same with also writing back to Datasets. Right now we only support reading from HF Datasets but the goal is to have full support. For us the functionality that HF Datasets offers is really important for the experience we want to offer and the functionality we are working on, e.g. hydrating MCP servers with precomputed data sets that are stored on HF. For an example of that, check this: https://huggingface.co/datasets/typedef-ai/fenic-0.4.0-codebase |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very nice! Added a few small questions/suggestions.
@@ -0,0 +1,235 @@ | |||
# fenic | |||
|
|||
[fenic](https://github.com/typedef-ai/fenic) is a PySpark-inspired DataFrame framework designed for building production AI and agentic applications. fenic provides support for reading datasets directly from the Hugging Face Hub. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is "fenic" always styled with lowercase "f"?
### Supported Formats | ||
|
||
fenic supports reading the following formats from Hugging Face: | ||
- **Parquet files** (`.parquet`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just for native parquet datasets, or do you also support auto-converted parquet branches?
df = session.read.parquet("hf://datasets/datasets-examples/doc-formats-csv-1@~parquet/**/*.parquet") | ||
``` | ||
|
||
### Reading with Schema Management |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be good to explain (briefly) what schema means in this context?
```python | ||
import fenic as fc | ||
|
||
# Requires OPENAI_API_KEY to be set for language and embedding calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on having inference providers examples here!
Hey everyone,
fenic now has native support for reading datasets directly from the Hugging Face Hub using the
hf://
protocol, documented at docs.fenic.ai. This PR adds the corresponding documentation to the Hugging Face docs.Changes
docs/hub/datasets-fenic.md
)_toctree.yml
Features documented
hf://
protocolImage
I'll add the image on a separate PR, is there any specific instructions I should follow for this?
Happy to answer any questions and of course accommodate any changes required.
Thank you for the amazing work you've been doing with HuggingFace Datasets.