Trying to Create BM25 Embeddings/Sementic Search - Error: prepareInput(text, "search").filter is not a function #116

Chaddeus · 2023-03-25T06:27:32Z

Chaddeus
Mar 25, 2023

Please forgive me, I've been hammering at this code for a few hours and I'm stuck. My goal is to take a passage of text. Chunk it into chunks of ~3 sentences each. Then add them to BM25 vectorizer. Then search for the chunk index using semantic search.

The error I am seeing is:

TypeError: prepareInput(text, "search").filter is not a function. (In 'prepareInput(text, "search").filter(function(t2) {
          return token2Index[t2] !== void 0;
        })', 'prepareInput(text, "search").filter' is undefined)

My code is below. It's a series of functions. One breaks a passage into chunks. The next feeds the chunks to the BM25 engine. The last one is to perform the search.

The line of code throwing the error is : const results = engine.search(query) from the last function.

const its = nlp.its
const engine = winkBM25()

export function createChunks(text: string): string[] {
    let contextDoc = nlp.readDoc(text)
    let sentences: Array<string> = []

    contextDoc.sentences().each((s) => {
        const sentence = []
        s.tokens().each((t) => sentence.push( t.out(its.precedingSpaces), t.out() ))
        sentences.push(sentence.join(''))
    })

	const chunks: string[] = []

	for (let i = 0; i < sentences.length; i += 3) {
		const chunk = sentences.slice(i, i + 3).join(' ')
		chunks.push(chunk)
	}

	return chunks
}

export function createEmbeddings(chunks: string[]): void {
    engine.defineConfig({ fldWeights: { tokens: 1 } })

    const uniqueChunks = [...new Set(chunks)] // Chunks are passages of text ~ 3 sentences long each

    uniqueChunks.forEach((chunk, index) => {
        const doc = nlp.readDoc(chunk)
        const tokens = doc.tokens().out()
        engine.addDoc({ tokens }, index)
    })

    engine.consolidate() // consolidate the learnings
}

export function semanticSearch(query: string): string | null {
	const results = engine.search(query)

	if (results.length > 0) return results[0].document.id
	else return null
}

And here is how I use these functions:

const pullContextFromNotes = () => {
	if ($currentDoc.notes.length < 100) return

	const chunks = createChunks($currentDoc.notes)

	if (chunks.length < 2) return
	createEmbeddings(chunks)

	const query = `ramen`
	const mostRelatedChunkIndex = semanticSearch(query)
	if (mostRelatedChunkIndex !== null) {
		console.log(`Most related chunk: ${chunks[mostRelatedChunkIndex]}`)
	}
	else {
		console.log('No chunk found')
	}
}

Hopefully it is something simple I am overlooking. Thank you for your time. 🙏

Answered by sanjayaksaxena

Mar 25, 2023

Hello @Chaddeus

It seems the code has not followed the required workflow – the define prep tasks has to be defined. Here is the revised code in JS for your reference:

// Load wink-bm25-text-search
var bm25 = require( 'wink-bm25-text-search' );
// Create search engine's instance
var engine = bm25();
// Load wink nlp and its model
const winkNLP = require( 'wink-nlp' );
// Use web model
const model = require( 'wink-eng-lite-web-model' );
const nlp = winkNLP( model );
const its = nlp.its;



function createChunks(text) {
    let contextDoc = nlp.readDoc(text)
    let sentences = []

    contextDoc.sentences().each((s) => {
        const sentence = []
        s.tokens().each((t) => sentence.push(

View full answer

sanjayaksaxena · 2023-03-25T08:04:08Z

sanjayaksaxena
Mar 25, 2023
Maintainer

Hello @Chaddeus

It seems the code has not followed the required workflow – the define prep tasks has to be defined. Here is the revised code in JS for your reference:

// Load wink-bm25-text-search
var bm25 = require( 'wink-bm25-text-search' );
// Create search engine's instance
var engine = bm25();
// Load wink nlp and its model
const winkNLP = require( 'wink-nlp' );
// Use web model
const model = require( 'wink-eng-lite-web-model' );
const nlp = winkNLP( model );
const its = nlp.its;



function createChunks(text) {
    let contextDoc = nlp.readDoc(text)
    let sentences = []

    contextDoc.sentences().each((s) => {
        const sentence = []
        s.tokens().each((t) => sentence.push( t.out(its.precedingSpaces), t.out() ))
        sentences.push(sentence.join(''))
    })

	const chunks = []

	for (let i = 0; i < sentences.length; i += 3) {
		const chunk = sentences.slice(i, i + 3).join(' ')
		chunks.push(chunk)
	}
	return chunks
}

const prepTask = function ( text ) {
  const tokens = [];
  nlp.readDoc(text)
      .tokens()
      // Use only words ignoring punctuations etc and from them remove stop words
      .filter( (t) => ( t.out(its.type) === 'word' && !t.out(its.stopWordFlag) ) )
      // Handle negation and extract stem of the word
      .each( (t) => tokens.push( (t.out(its.negationFlag)) ? '!' + t.out(its.stem) : t.out(its.stem) ) );
 
  return tokens;
};

function createEmbeddings(chunks) {
    engine.defineConfig({ fldWeights: { text: 1 } })
    engine.definePrepTasks( [ prepTask ] );

    chunks.forEach((text, index) => {
        engine.addDoc({ text }, index)        
    })

    engine.consolidate() // consolidate the learnings
}
function semanticSearch(query) {
	const results = engine.search(query)
	if (results.length > 0) return results[0]
	else return null
}
const text = `Sen. Edward Kennedy (D., Mass.) said, "It's a bottom-line issue".
The Nasdaq 100 rose 7.08 to 445.23. (Are parenthesis part of a sentence?)
"This is a quoted... sentence." "(This is a quoted sentence within parenthesis.)"
('Like the previous one!') AI Inc. is focussing on AI. I work for AI Inc. 
My mail is r2d2@yahoo.com! U.S.A is my birth place.`
const theChunks = [ ... new Set(createChunks(text))];
createEmbeddings(theChunks)
const result = semanticSearch('quoted sentence');
if (result !== null) console.log(theChunks[result[0]])
// --> " AI Inc. is focussing on AI.  I work for AI Inc.  \nMy mail is r2d2@yahoo.com!"

1 reply

Chaddeus Mar 25, 2023
Author

Thank you for the incredibly fast reply. I will test as soon as possible and reply. 🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to Create BM25 Embeddings/Sementic Search - Error: prepareInput(text, "search").filter is not a function #116

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Trying to Create BM25 Embeddings/Sementic Search - Error: prepareInput(text, "search").filter is not a function #116

Chaddeus Mar 25, 2023

Replies: 1 comment · 1 reply

sanjayaksaxena Mar 25, 2023 Maintainer

Chaddeus Mar 25, 2023 Author

Chaddeus
Mar 25, 2023

Replies: 1 comment 1 reply

sanjayaksaxena
Mar 25, 2023
Maintainer

Chaddeus Mar 25, 2023
Author