Skip to content

Commit

Permalink
replace the column name to get the correct column to cosine distance,…
Browse files Browse the repository at this point in the history
… the correct column is TEXT_CLEAN.
  • Loading branch information
rafatieppo committed Nov 22, 2024
1 parent f6f6452 commit 234335c
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion resources/tidydata_uniq_titles.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def drop_similar_rows(df, column, threshold):

# Convert the text data to TF-IDF features
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(dfdata[column])
tfidf_matrix = vectorizer.fit_transform(dfdata['TEXT_CLEAN'])
# Compute cosine similarity matrix
cosine_sim = cosine_similarity(tfidf_matrix)
# Track rows to drop
Expand Down

0 comments on commit 234335c

Please sign in to comment.