Add role splitting in markdown processor, remove old guidance regions #34

extremeheat · 2024-04-21T19:48:19Z

Breaking

Remove old "guidance regions"
Flow arguments for prompt handling have been updated.
- prompt: string replaced with prompt: { system?: string, user?: string } or prompt: { text: string, roles: Record<string, Role> } to support new role handling.

Features

Support "role" splitting in markdown processor. See docs/MarkdownProcessor.md's role section for doc.
- calling loadPrompt() now accepts a new arg, options, which can take an a "role" field, set to a Record<string, string> map. The keys are the tokens to split by and the values are the roles.
Add "requestChatCompletion" API method on CompletionService for sending messages to models
- Supports caching
Old "guidance regions" removed from requestCompletion() in favor of a new "guidance" role in messages which can be used now via "requestChatCompletion"
Markdown processor now strips markdown comments out
content and text are returned on all completion requests, whether chat or not

Fixes

Fix caching issues in CompletionService for requestCompletion
Fixes to markdown tokenization (handle comments, preformatted blocks)
Fix Flow transformResponse

…letionService.requestCompletion in favor of guidance message role

Update comment handling in MDP to only strip `<!---`

extremeheat · 2024-04-23T04:37:33Z

/review

import-bot

Hey extremeheat, thanks for the pull request! This looks like a great update. I have a few comments and suggestions:

In docs/MarkdownProcessing.md, I think it would be helpful to explicitly state that the guidance role should be the last message in the array. This is mentioned in the note at the end of the section, but it might be easy to miss.

Here's the code I'm referring to:
```
Note: there can only be one guidance message and it must be the last one. You should remove
it from the messages array the next call you do to requestChatCompletion. This feature works
best when used with the role parsing system above.
```
I suggest adding a sentence like this to the main paragraph:

"The guidance role should always be the last message in the messages array."
In src/Flow.js, the _hash function now uses JSON.stringify to hash the arguments. While this works, it might be more efficient to use a dedicated hashing library like crypto to create a hash of the arguments. This would avoid the overhead of stringifying the arguments, especially if they are large objects.

Here's the code I'm referring to:
```
_hash (...args) {
  const hash = crypto.createHash('sha1')
  hash.update(JSON.stringify(args))
  return hash.digest('hex')
}
```

In src/tools/mdp.js, the segmentByRoles function uses a loop to iterate over the text and find the role tokens. This could potentially be optimized by using a regular expression to find all the role tokens at once.

Here's the code I'm referring to:

function segmentByRoles (text, roles) {
  // split the text into segments based on the roles
  const segments = []
  for (let i = 0; i < text.length; i++) {
    for (const role in roles) {
      if (text.slice(i, i + role.length) === role) {
        segments.push({ role, start: i, end: i + role.length })
        i += role.length
        break
      }
    }
  }

In src/tools/stripping.js, the tokenizeMarkdown function has a complex logic to handle code blocks and preformatted blocks. It might be worth considering using a dedicated markdown parsing library to handle this logic, which could potentially simplify the code and make it more robust.

Here's the code I'm referring to:
```
function tokenizeMarkdown (comment, options) {
  const tokens = []
  let tokenSoFar = ''
  let inCodeBlock = false
  let inCodeLang
  let inPreTag = false
  let linePadding = 0
```

Overall, I think this pull request is well-written and introduces some valuable features. I would be happy to accept it once the above comments are addressed.

import-bot · 2024-04-23T04:50:25Z

docs/MarkdownProcessing.md

@@ -1,4 +1,4 @@
-LXL contains a simple templating system that allows you to conditionally insert data into the markdown at runtime.


In docs/MarkdownProcessing.md, I think it would be helpful to explicitly state that the guidance role should be the last message in the array. This is mentioned in the note at the end of the section, but it might be easy to miss.

Here's the code I'm referring to:

Note: there can only be one guidance message and it must be the last one. You should remove it from the messages array the next call you do to requestChatCompletion. This feature works best when used with the role parsing system above.

I suggest adding a sentence like this to the main paragraph:

"The guidance role should always be the last message in the messages array."

import-bot · 2024-04-23T04:50:25Z

src/Flow.js

  _hash (...args) {
    const hash = crypto.createHash('sha1')
-    args.filter(e => e != null).map(String).forEach(arg => hash.update(arg))
+    hash.update(JSON.stringify(args))
    return hash.digest('hex')
  }


In src/Flow.js, the _hash function now uses JSON.stringify to hash the arguments. While this works, it might be more efficient to use a dedicated hashing library like crypto to create a hash of the arguments. This would avoid the overhead of stringifying the arguments, especially if they are large objects.

import-bot · 2024-04-23T04:50:26Z

src/tools/mdp.js

+function segmentByRoles (text, roles) {
+  // split the text into segments based on the roles
+  const segments = []
+  for (let i = 0; i < text.length; i++) {
+    for (const role in roles) {
+      if (text.slice(i, i + role.length) === role) {
+        segments.push({ role, start: i, end: i + role.length })
+        i += role.length
+        break
+      }
+    }
+  }


In src/tools/mdp.js, the segmentByRoles function uses a loop to iterate over the text and find the role tokens. This could potentially be optimized by using a regular expression to find all the role tokens at once.

import-bot · 2024-04-23T04:50:26Z

src/tools/stripping.js

 function tokenizeMarkdown (comment, options) {
-  // console.log('Tokenize', comment)
  const tokens = []
  let tokenSoFar = ''
  let inCodeBlock = false
  let inCodeLang
+  let inPreTag = false
+  let linePadding = 0


In src/tools/stripping.js, the tokenizeMarkdown function has a complex logic to handle code blocks and preformatted blocks. It might be worth considering using a dedicated markdown parsing library to handle this logic, which could potentially simplify the code and make it more robust.

extremeheat added 8 commits April 21, 2024 14:31

mdp: Add option to segment by role, remove guidance support from Comp…

8c4dcd7

…letionService.requestCompletion in favor of guidance message role

remove guidance from GAIS requestCompletion

784bc8c

update GAIS

46fe8b5

update doc

cd94d45

Ensure both text and content are returned on all complete/chat completes

0370a04

Add parseMarkdown tool, support preformat blocks in md tokenizer

5bc3314

Update comment handling in MDP to only strip `<!---`

support comments in md tokenizer

643d540

Update Flow for md, fix transformResponse

74db532

import-bot suggested changes Apr 23, 2024

View reviewed changes

update docs

fabe06a

extremeheat force-pushed the customroles branch from 540d3c2 to fabe06a Compare April 23, 2024 05:04

extremeheat merged commit f4840f6 into main Apr 23, 2024
1 check passed

extremeheat deleted the customroles branch April 23, 2024 05:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add role splitting in markdown processor, remove old guidance regions #34

Add role splitting in markdown processor, remove old guidance regions #34

extremeheat commented Apr 21, 2024 •

edited

Loading

extremeheat commented Apr 23, 2024

import-bot left a comment

import-bot Apr 23, 2024

import-bot Apr 23, 2024

import-bot Apr 23, 2024

import-bot Apr 23, 2024

		@@ -1,4 +1,4 @@
		LXL contains a simple templating system that allows you to conditionally insert data into the markdown at runtime.

Add role splitting in markdown processor, remove old guidance regions #34

Add role splitting in markdown processor, remove old guidance regions #34

Conversation

extremeheat commented Apr 21, 2024 • edited Loading

extremeheat commented Apr 23, 2024

import-bot left a comment

Choose a reason for hiding this comment

import-bot Apr 23, 2024

Choose a reason for hiding this comment

import-bot Apr 23, 2024

Choose a reason for hiding this comment

import-bot Apr 23, 2024

Choose a reason for hiding this comment

import-bot Apr 23, 2024

Choose a reason for hiding this comment

extremeheat commented Apr 21, 2024 •

edited

Loading