You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using reticulate for a shiny app for semantic search with BERT from SentenceTransformers: Error in py_call_impl(callable, dots$args, dots$keywords) : NameError: name 'faiss' is not defined
#1469
Open
alicesaunders opened this issue
Aug 31, 2023
· 1 comment
When running the code below I am repeatedly getting the following error:
Error in py_call_impl(callable, dots$args, dots$keywords) :
NameError: name 'faiss' is not defined
I have two scripts, app.R and pythonSemanticSearch.py. Some lines are commented out as they are alternative functions I have tried to use to see if it fixes the error but it has remained the same. My original code is using an index generated and saved by another script that is then being read in here (I will keep the code for this commented out but have replaced it with a different dataset and index generated within this code for reproducibility). A different dataset was used to generate this index and replaces the variable df in the example code below.
Here is the code from app.R:
library(shiny)
library(reticulate)
use_python("my_env/Scripts/python.exe")
sentence_transformers <- reticulate::import("sentence_transformers")
SentenceTransformer <- sentence_transformers$SentenceTransformer
for (i in 1:10) {
gc(full = TRUE)
system("nvidia-smi | grep MiB | grep Default")
#model <- SentenceTransformer("trained_model_ALL_300")
model <- SentenceTransformer('multi-qa-distilbert-cos-v1')
}
faiss <- reticulate::import("faiss")
datasets <- reticulate::import(datasets)
load_dataset <- datasets$load_dataset
ds = load_dataset('crime_and_punish', split='train[:100]')
ds_with_embeddings = ds.map(lambda example: {'embeddings': ctx_encoder(**ctx_tokenizer(example["line"], return_tensors="pt"))[0][0].numpy()})
ds_with_embeddings.add_faiss_index(column='embeddings')
#faiss <- reticulate::import("faiss")
#read_index <- faiss$read_index
#index_path <- "trained_model_index_ALL_300.index"
#index = read_index(index_path)
#py_run_string("from sentence_transformers import SentenceTransformer")
python <- import("pythonSemanticSearch") #import python script
#python <- py_run_file("pythonSemanticSearch.py", local = TRUE)
python$import_libraries() #import libraries from python function
#load df (from which the index was generated and the resulting dataframe needs to be based on)
df = load_dataset('crime_and_punish')
# Define UI for application
ui <- fluidPage(
# Application title
titlePanel("Semantic Search App"),
# Sidebar with a slider input for number of bins
sidebarLayout(
sidebarPanel(
textInput(input = "query", "Enter your query:", ""),
actionButton(input = "search", "Search")
),
# Show a plot of the generated distribution
mainPanel(
tableOutput("results"),
downloadButton(input = "downloadCSV", "Download CSV")
)
)
)
# Define server logic
server <- function(input, output) {
# import the python module within the server logic
python <- import("pythonSemanticSearch")
# import model
sentence_transformers <- reticulate::import("sentence_transformers")
SentenceTransformer <- sentence_transformers$SentenceTransformer
faiss <- reticulate::import("faiss")
index_path <- "trained_model_index_ALL_300.index"
index = faiss$read_index(index_path)
for (i in 1:10) {
gc(full = TRUE)
system("nvidia-smi | grep MiB | grep Default")
model <- SentenceTransformer("trained_model_ALL_300")
}
# attach to cpu
#model <- python$load_model(model)
# import index
index <- python$load_index('trained_model_ALL_300.index')
# check query not null and encode using BERT
query_vector <- reactive({
query <- input$query
if (!is.null(query) && nchar(query)>0) {
python$encode_query(query, model) # .py fcn
}
})
# generate search results
results <- eventReactive(input$search, {
if (!is.null(query_vector())) {
query_embedding <- query_vector()
python$vector_search(input$query, query_embedding, model, index, df, num_results=10)
}
})
# output table
output$results <- renderTable({
results()
})
# download csv
output$downloadCSV <- downloadHandler(
filename = function() {
"semantic_search_results.csv"
},
content = function(file) {
write.csv(results(), file)
}
)
}
# Run the app
shinyApp(ui, server)
and here is the code from pythonSemanticSearch.py:
python script for the app - functions to read in the model
def import_libraries():
"""
Import the required libraries.
"""
import pandas as pd
import glob
import numpy as np
import torch
import faiss
from pathlib import Path
import csv
from sentence_transformers import SentenceTransformer
from sentence_transformers import InputExample, losses, datasets
from tqdm import tqdm
def load_model(model_name):
"""
Load a SentenceTransformer model.
Args:
model_name (str): Name of the SentenceTransformer model.
Returns:
model: Loaded SentenceTransformer model.
"""
#model = SentenceTransformer(model_name)
# Check if GPU/CPU is available and use it
if torch.cuda.is_available():
model = model.to(torch.device("cuda"))
print(model.device)
return model
def load_index(index_path):
"""
Load a FAISS index.
Args:
index_path (str): Path to the FAISS index file.
Returns:
index: Loaded FAISS index.
"""
index = faiss.read_index(index_path)
return index
def encode_query(query, model):
"""
Encode a query using a SentenceTransformer model.
Args:
query (str): User query that should be more than a sentence long.
model: Sentence-transformers model.
Returns:
vector (numpy.array): Encoded vector of the query.
"""
vector = model.encode([query])
return vector
def vector_search(query, vector, model, index, df, num_results=10):
"""
Transform the search query to a vector using a BERT model and find similar vectors using FAISS.
Create a pandas DataFrame with the search results.
Args:
query (str): User query that should be more than a sentence long.
model_name (str): Name of the SentenceTransformer model.
index_path (str): Path to the FAISS index file.
df: DataFrame containing report information.
num_results (int): Number of results to return.
Returns:
results_df: Pandas DataFrame containing the results.
"""
D, I = index.search(np.array(vector).astype("float32"), k=num_results)
def id2details(df, I, column):
return [list(df[df.UniqueID == idx][column]) for idx in I[0]]
title = id2details(df, I, 'docname')
text = id2details(df, I, 'paratext')
data = {
'Title': [item[0] for item in title],
'Text': [item[0] for item in text],
'Search query': query
}
results_df = pd.DataFrame(data)
return results_df
The model and index used are files that I have already generated in a previous script. A base model from SentenceTransformers can be used instead e.g. model <- SentenceTransformer('multi-qa-distilbert-cos-v1').
The text was updated successfully, but these errors were encountered:
Can you please try to make your example smaller and something I can run locally to reproduce the error?
At a quick glance it looks like there is unmodified python code in the R script (e.g, usage of . and { in ds.map, etc.). Also, the python function import_libraries() seems to me coming from a misunderstanding of the difference in scoping rules between python and R; import does not make the package symbols globally available like library() does in R.
(This issue thread is more for reporting bugs than for support).
When running the code below I am repeatedly getting the following error:
Error in py_call_impl(callable, dots$args, dots$keywords) :
NameError: name 'faiss' is not defined
I have two scripts, app.R and pythonSemanticSearch.py. Some lines are commented out as they are alternative functions I have tried to use to see if it fixes the error but it has remained the same. My original code is using an index generated and saved by another script that is then being read in here (I will keep the code for this commented out but have replaced it with a different dataset and index generated within this code for reproducibility). A different dataset was used to generate this index and replaces the variable df in the example code below.
Here is the code from app.R:
and here is the code from pythonSemanticSearch.py:
python script for the app - functions to read in the model
Here is the output from:
reticulate::py_config()
NOTE: Python version was forced by RETICULATE_PYTHON
here is the output from
utils::SessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reticulate_1.24 shiny_1.6.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.8.3 rstudioapi_0.13 magrittr_2.0.1 rappdirs_0.3.3 xtable_1.8-4
[6] lattice_0.20-41 R6_2.5.0 rlang_0.4.10 fastmap_1.1.0 tools_4.0.4
[11] grid_4.0.4 png_0.1-7 jquerylib_0.1.3 withr_2.4.1 htmltools_0.5.1.1
[16] ellipsis_0.3.1 digest_0.6.27 lifecycle_1.0.0 crayon_1.4.1 Matrix_1.2-18
[21] later_1.1.0.1 sass_0.4.1 promises_1.2.0.1 cachem_1.0.4 mime_0.10
[26] compiler_4.0.4 bslib_0.2.4 jsonlite_1.7.2 httpuv_1.5.5
The model and index used are files that I have already generated in a previous script. A base model from SentenceTransformers can be used instead e.g. model <- SentenceTransformer('multi-qa-distilbert-cos-v1').
The text was updated successfully, but these errors were encountered: