Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve keyword-based prompting #10

Closed
7 tasks
0art0 opened this issue Oct 11, 2022 · 4 comments
Closed
7 tasks

Improve keyword-based prompting #10

0art0 opened this issue Oct 11, 2022 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@0art0
Copy link
Collaborator

0art0 commented Oct 11, 2022

  • Collect an extensive list of mathematical keywords
  • Handle synonymous keywords (Isomorphism, isomorphism, isomorphisms and isomorphic are all considered the same)
  • Extract keywords from mathlib docstrings and backtranslated statements and group mathlib statements by keywords
  • Organise and store the mathlib keyword data for efficient search
  • Allow for a custom database of statements and keywords in addition to mathlib
  • Design heuristics for selecting statements with a given set of keywords
  • Allow for this to be used as a standalone tool
@0art0 0art0 self-assigned this Oct 11, 2022
@0art0 0art0 added the enhancement New feature or request label Oct 11, 2022
@siddhartha-gadgil
Copy link
Owner

@0art0 Perhaps lemmatization (https://en.wikipedia.org/wiki/Lemmatisation) may be useful, using something like Wordnet (I don't know if there are modern versions of this based on LLMs).

@0art0
Copy link
Collaborator Author

0art0 commented Dec 14, 2022

  • Compute the embeddings of all mathematical keywords from Wiktionary and store them.
  • Following https://towardsdatascience.com/keyword-extraction-with-bert-724efca412ea (as suggested by @siddhartha-gadgil), attempt keyword extraction by computing similarity between the input sentence and potential keywords.
  • If the results are reasonable, experiment again with a larger keyword set comprising all mathematics-related Wikipedia titles (using this script)
  • Retain only those keywords that occur in the database of sentences for input-dependent prompting (prompts.json)
  • Compute and store the keyword data for all sentences in the database, using the restricted set of keywords.
  • Create a pipeline for keyword-based prompting.

@0art0
Copy link
Collaborator Author

0art0 commented Dec 14, 2022

As identifier-based prompting may be useful in other contexts, the infrastructure for keyword-based prompting should ideally be abstract enough to handle prompting of this general nature.

@siddhartha-gadgil
Copy link
Owner

Models are changing and this part has lower priority. Reopen if focussed on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants