Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: added reference to spacy-setfit to the spaCy Universe #12737

Merged
merged 2 commits into from
Jun 19, 2023
Merged

docs: added reference to spacy-setfit to the spaCy Universe #12737

merged 2 commits into from
Jun 19, 2023

Conversation

davidberenstein1957
Copy link
Contributor

@davidberenstein1957 davidberenstein1957 commented Jun 18, 2023

Description

I created a package called spacy-setfit to easily integrate SetFit into spaCy pipelines including training and some data formatting steps.

Note that, I also updated some references of packages I made for my previous employer which I reclaimed ownership of.

import spacy
import spacy_setfit

# Create some example data
train_dataset = {
    "inlier": ["This text is about chairs.",
               "Couches, benches and televisions.",
               "I really need to get a new sofa."],
    "outlier": ["Text about kitchen equipment",
                "This text is about politics",
                "Comments about AI and stuff."]
}

# Load the spaCy language model:
nlp = spacy.load("en_core_web_sm")

# Add the "text_categorizer" pipeline component to the spaCy model, and configure it with SetFit parameters:
nlp.add_pipe("text_categorizer", config={
    "pretrained_model_name_or_path": "paraphrase-MiniLM-L3-v2",
    "setfit_trainer_args": {
        "train_dataset": train_dataset
    }
})
doc = nlp("I really need to get a new sofa.")
doc.cats
# {'inlier': 0.902350975129, 'outlier': 0.097649024871}

Types of change

  • Docs

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@tomaarsen
Copy link
Contributor

Hello David!

This is really awesome. It seems like it really lowers the complexity of training and using SetFit models. I'm glad to show the project under the SetFit Related Work, and I'll try to think of ways how best to share this around.

  • Tom Aarsen

@victorialslocum victorialslocum added the universe Changes to the Universe directory of third-party spaCy code. label Jun 19, 2023
@victorialslocum
Copy link
Contributor

Hi David!

Thanks for the contribution! I ran the examples and everything worked as expected, this is a super cool library. Would you be able to add a factory entry point to your package setup.cfg (see an example here)? This way the example won’t need import spacy_setfit.

@davidberenstein1957
Copy link
Contributor Author

Hi @victorialslocum

I resolved this and also did this for my other packages in the universe.

@victorialslocum
Copy link
Contributor

Thanks for the quick fix on this @davidberenstein1957!

Copy link
Contributor

@victorialslocum victorialslocum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this and the code all runs well, the package looks great, and the addition looks good with the formatting for the website. LGTM!

@svlandeg svlandeg merged commit 53c400b into explosion:master Jun 19, 2023
svlandeg pushed a commit that referenced this pull request Jun 19, 2023
* docs: added reference to spacy-setfit

* removed package import after adding factory entry points to packages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
universe Changes to the Universe directory of third-party spaCy code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants