Skip to content

Conversation

cpard
Copy link

@cpard cpard commented Sep 18, 2025

Hey everyone,

fenic now has native support for reading datasets directly from the Hugging Face Hub using the hf:// protocol, documented at docs.fenic.ai. This PR adds the corresponding documentation to the Hugging Face docs.

Changes

  • New documentation page for fenic integration (docs/hub/datasets-fenic.md)
  • Updated libraries table adding fenic
  • Added navigation entry in _toctree.yml

Features documented

  • Read CSV and Parquet files directly from HF datasets using hf:// protocol
  • Support for dataset revisions and versions
  • Mix HF data sources with local files in a single read operation
  • Process data with PySpark-inspired DataFrame operations and AI-powered transformations

Image

I'll add the image on a separate PR, is there any specific instructions I should follow for this?

Happy to answer any questions and of course accommodate any changes required.

Thank you for the amazing work you've been doing with HuggingFace Datasets.

@cpard
Copy link
Author

cpard commented Sep 18, 2025

@lhoestq
Copy link
Member

lhoestq commented Sep 22, 2025

I'm just discovering fenic, the API looks great :) btw is there a way to use Hugging Face Inference Providers for the semantic / generative operations ?

It's a unified API for many providers serving models on HF, you can find more info at https://huggingface.co/docs/inference-providers/en/index

@cpard
Copy link
Author

cpard commented Sep 22, 2025

I'm just discovering fenic, the API looks great :) btw is there a way to use Hugging Face Inference Providers for the semantic / generative operations ?

It's a unified API for many providers serving models on HF, you can find more info at https://huggingface.co/docs/inference-providers/en/index

Thank you so much for the kind words. It's great to hear from you that the API looks great!

Regarding HF Inference Providers, we don't currently have support for it but we will definitely add it, same with also writing back to Datasets. Right now we only support reading from HF Datasets but the goal is to have full support. For us the functionality that HF Datasets offers is really important for the experience we want to offer and the functionality we are working on, e.g. hydrating MCP servers with precomputed data sets that are stored on HF.

For an example of that, check this: https://huggingface.co/datasets/typedef-ai/fenic-0.4.0-codebase
This dataset is generated using fenic, over the fenic codebase and then is used to hydrate the MCP server we use for our documentation tooling, check here for more information: https://github.com/typedef-ai/fenic/tree/main/examples/mcp/docs-server

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@davanstrien davanstrien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice! Added a few small questions/suggestions.

@@ -0,0 +1,235 @@
# fenic

[fenic](https://github.com/typedef-ai/fenic) is a PySpark-inspired DataFrame framework designed for building production AI and agentic applications. fenic provides support for reading datasets directly from the Hugging Face Hub.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "fenic" always styled with lowercase "f"?

### Supported Formats

fenic supports reading the following formats from Hugging Face:
- **Parquet files** (`.parquet`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just for native parquet datasets, or do you also support auto-converted parquet branches?

df = session.read.parquet("hf://datasets/datasets-examples/doc-formats-csv-1@~parquet/**/*.parquet")
```

### Reading with Schema Management
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to explain (briefly) what schema means in this context?

```python
import fenic as fc

# Requires OPENAI_API_KEY to be set for language and embedding calls
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on having inference providers examples here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants