Skip to content

Commit

Permalink
Fix expired link in algorithm.md (#1396)
Browse files Browse the repository at this point in the history
  • Loading branch information
burugaria7 authored Jul 11, 2023
1 parent 9a85d33 commit 8e0e316
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/algorithm/algorithm.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ To do this, we first combine all documents in a cluster into a single document.

!!! tip Bag-of-words and tokenization

There are many ways you can tune or change the bag-of-words step. This step allows for processing the documents however you want without affecting the first step, embedding the documents. You can follow the guide [here](https://maartengr.github.io/BERTopic/getting_started/countvectorizer/countvectorizer.html) for more information about tokenization options in BERTopic.
There are many ways you can tune or change the bag-of-words step. This step allows for processing the documents however you want without affecting the first step, embedding the documents. You can follow the guide [here](https://maartengr.github.io/BERTopic/getting_started/vectorizers/vectorizers.html) for more information about tokenization options in BERTopic.

### **5. Topic representation**
From the generated bag-of-words representation, we want to know what makes one cluster different from another. Which words are typical for cluster 1 and not so much for all other clusters? To solve this, we need to modify TF-IDF such that it considers topics (i.e., clusters) instead of documents.
Expand Down Expand Up @@ -157,4 +157,4 @@ The following models are implemented in `bertopic.representation`:
* `TextGeneration`
* `Cohere`
* `OpenAI`
* `LangChain`
* `LangChain`

0 comments on commit 8e0e316

Please sign in to comment.