Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Background literature review / discussion of future options for natural language search - current tools for LLM / Graph / LangChain #403

Open
Don-Isdale opened this issue Jul 10, 2024 · 0 comments
Labels
phase:discussion prospective-feature Discussion of a feature which might be added - is not yet confirmed as a requirement

Comments

@Don-Isdale
Copy link
Collaborator

Don-Isdale commented Jul 10, 2024

Background literature review

This field continues to advance rapidly; this issue notes some new tools and publications during the period since the time of this prototype / proof-of-concept.

LangGraph has been added, which may be more suitable than LangChain which the prototype used.
Many of the discussions on this topic see LangChain as being an overly-engineered abstraction which does not bring a lot of value for many uses.


2024Jul10

Use of LLMs and vector database for more user-friendly GUI

The software tools supporting LLM functionality have evolved rapidly since the options=naturalSearch experimental sketch of functionality was added ([feature/naturalSearch a4ad67c]).
This issue collates some of the currently available tools which may be useful in updating and extending that work.

For the dataset graph, there are graph display libraries which provide a better display than the simple graph display which was implemented directly in D3.

For the natural-language dialog which translates user sentences into function calls which display the requested data, which was written using LangChain, many options have become available.
LangChain itself is offering LangGraph, and there are numerous other alternatives. The selection listed below focuses on RAG / agentic tools, and graph data. The multi-step dialog with the LLM can be modeled using RAG / agentic systems. The dataset metadata is a combination of text and labels; the labels establish some graph relationships between the datasets, and allowing the user to browse easily to related datasets will assist in dataset finding and discovery.

The other reason for focusing on graph-related tools is that, looking beyond the two features implemented so far, the relationships between features (markers, genes, SNPs) in genomic datasets form graphs, and displaying and navigating the graph relationships is likely to assist the users in finding features with the characteristics they are looking for. For example, the LLMs can be used to parse published text about features and embed this as vectors, enabling related features to be found via cosine similarity of vectors associated with them.

Input of user instructions as text may suit many users, but some will prefer verbal input and this is now well supported, e.g. demonstrated in https://openai.com/index/hello-gpt-4o/.


Determining relationships between datasets, to support user browsing and finding datasets

  • dataset has name, DOI or title / abstract, metadata
  • LLM can request abstract from DOI and read it, reducing it to vector embeddings
  • metadata is structured; inconsistent field names can be cleaned up using LLM; for the structure and values which are regular, an algorithmic approach such as Jaccard similarity can be used to establish how related datasets are.
  • dataset name consists of names (species, variety, institution, project, date, ...)
    LLM can parse this; split into pieces; regularise variations in capitalisation, date format;
    establish relationships within the sub-fields e.g. related species, variety implies species.

Finding near-duplicates with Jaccard similarity and MinHash
https://blog.nelhage.com/post/fuzzy-dedup/
https://news.ycombinator.com/item?id=40872438


https://news.ycombinator.com/item?id=37315866
https://docs.openinterpreter.com/guides/basic-usage

https://news.ycombinator.com/item?id=37669976
Superagent.sh on Replit: An open-source framework for creating AI-assistants (replit.com)
https://blog.replit.com/superagent

https://news.ycombinator.com/item?id=37291753
The Founder of Superagent Discusses Agents Tracing, Observability, and Debugging (framer.website)
https://e2b-blog.framer.website/blog/discussing-agents-challenges-with-ismail-pelaseyed-the-founder-of-superagent
Interview
Aug 22, 2023
Tereza Tizkova
Discussing agents' tracing, observability, and debugging with Ismail Pelaseyed, the founder of Superagent

https://twitter.com/pelaseyed

https://news.ycombinator.com/item?id=38076193
Superagent: The AI-assistant framework for LLMs (superagent.sh)

May 19, 2024
https://neurosnap.ai/services
Neurosnap - Bioinformatic tools available.
bioinformatics

May 18, 2024
GitHub - metaskills/experts: Experts.js is the easiest way to create and deploy OpenAI's Assistants and link them together as Tools to create advanced Multi AI Agent Systems with expanded memory and attention to detail.
https://github.com/metaskills/experts
JavaScript
OpenAI
agent
virtual-assistant

www.unremarkable.ai/experts/

neo4j.com
https://neo4j.com/developer-blog/graphrag-llm-knowledge-graph-builder/
LLM
graph
RAG
JavaScript
Neo4j

Next-generation data filtering in the genomics era | Nature Reviews Genetics
https://www.nature.com/articles/s41576-024-00738-6


https://news.ycombinator.com/item?id=37269376
https://arxiv.org/abs/2308.08945
Interpretable Graph Neural Networks for Tabular Data

https://vimeo.com/369977717
InfoVis 2019: Persistent Homology Guided Force-Directed Graph Layouts
4 years ago
Authors: Ashley Suh, Mustafa Hajij, Bei Wang, Carlos Scheidegger, Paul Rosen

...
https://ieeexplore.ieee.org/document/8807379
Persistent Homology Guided Force-Directed Graph Layouts
Publisher: IEEE
PDF
Ashley Suh; Mustafa Hajij; Bei Wang; Carlos Scheidegger; Paul Rosen
All Authors

...
Published in: IEEE Transactions on Visualization and Computer Graphics ( Volume: 26, Issue: 1, January 2020)
Page(s): 697 - 707
Date of Publication: 20 August 2019
ISSN Information:


https://untanglefdl.github.io/

https://www.cg.tuwien.ac.at/courses/Visualisierung2/HallOfFame/2020/projects/suh2019-Oliver%20Pilizar_44328670_assignsubmission_file_/program_submission/html/index.html
This application is an implementation of the paper Implementation of Persistent Homology Guided Force-Directed Graph Layouts by Suh et al. Figure 1 shows the application.

https://www.cg.tuwien.ac.at/courses/Visualisierung2/HallOfFame/2020/projects/suh2019-Oliver%20Pilizar_44328670_assignsubmission_file_/program_submission/bin/

https://www.nature.com/articles/s41598-020-66710-6
Weighted persistent homology for osmolyte molecular aggregation and hydrogen-bonding network analysis

https://www.nature.com/articles/s41598-020-66710-6/figures/1
The illustration of the basic components in persistent homology. ...

https://en.wikipedia.org/wiki/Persistent_homology

https://www.nature.com/articles/s41598-019-55660-3
Weighted persistent homology for biomolecular data analysis
fig 5 : LPH and LWPH based DNA featurization. ...

https://www.researchgate.net/figure/Persistent-homology-quantifies-the-topological-configuration-of-biological_fig3_327236276
Persistent homology quantifies the topological configuration of biological nano-structures. ...

https://www.researchgate.net/publication/327236276_Topological_data_analysis_quantifies_biological_nano-structure_from_single_molecule_localization_microscopy
Topological data analysis quantifies biological nano-structure from single molecule localization microscopy
... we employ persistence homology to move beyond clustering, and quantify the topological structure within data.


Opinionated RAG wiki: https://github.com/zby/answerbot/wiki

...

https://outerbounds.com/blog/retrieval-augmented-generation/
introduction to RAG and how to use of Metaflow for RAG

[0] https://github.com/jackmpcollins/magentic
Seamlessly integrate LLMs as Python functions

https://news.ycombinator.com/item?id=38096958
New: LangChain templates – fastest way to build a production-ready LLM app (github.com/langchain-ai)
https://github.com/langchain-ai/langchain/tree/master/templates

https://news.ycombinator.com/item?id=38256645
Infinite Context LLMs: Going Beyond RAG with Extended Minds (normalcomputing.ai)
https://blog.normalcomputing.ai/posts/2023-09-12-supersizing-transformers/supersizing-transformers.html

https://github.com/llmware-ai/llmware/tree/main/examples/SLIM-Agents
(from https://news.ycombinator.com/item?id=39955725)

https://www.marktechpost.com/2024/06/01/gnn-rag-a-novel-ai-method-for-combining-language-understanding-abilities-of-llms-with-the-reasoning-abilities-of-gnns-in-a-retrieval-augmented-generation-rag-style/

https://www.marktechpost.com/2024/06/01/how-rag-helps-transformers-to-build-customizable-large-language-models-a-comprehensive-guide/

https://news.ycombinator.com/item?id=40711447
LLM that can call multiple tool APIs with one request (cohere.com)
https://cohere.com/blog/multi-step-tool-use

https://news.ycombinator.com/item?id=40739982
Why we no longer use LangChain for building our AI agents
https://www.octomind.dev/blog/why-we-no-longer-use-langchain-for-building-our-ai-agents

https://langchain-ai.github.io/langgraph/
LangGraph
https://www.langchain.com/langgraph

https://networkx.org/
NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

https://www.lycee.ai/blog/rag-fastapi-postgresql-pgvector
Building a Retrieval Augmented Generation System Using FastAPI

https://news.ycombinator.com/item?id=40395107
Multi AI Agent Systems Using OpenAI's New GPT-4o Model (github.com/metaskills)
https://github.com/metaskills/experts


https://news.ycombinator.com/item?id=40345775
GPT-4o (openai.com)
https://openai.com/index/hello-gpt-4o/


https://analyticsindiamag.com/microsoft-unveils-graphrag-outperforms-traditional-rag-in-data-discovery/
Microsoft Unveils GraphRAG, Outperforms Traditional RAG in Data Discovery

https://www.nature.com/articles/s42256-024-00847-1
Physicochemical graph neural network for learning protein–ligand interaction fingerprints from sequence data

https://towardsdatascience.com/combine-text-embeddings-and-knowledge-graph-embeddings-in-rag-systems-5e6d7e493925
Combine Text Embeddings and Knowledge (Graph) Embeddings in RAG systems

https://blog.langchain.dev/graph-based-metadata-filtering-for-improving-vector-search-in-rag-applications/
Graph-based metadata filtering for improving vector search in RAG applications

https://www.nature.com/articles/s41467-024-44980-2
Pangenome graphs improve the analysis of structural variants in rare genetic diseases

https://www.substrate.run/
Substrate is designed to describe and run multi-inference workloads as fast as possible in a system that maximizes parallelism, throughput, and data locality.

https://thenewstack.io/lets-get-agentic-langchain-and-llamaindex-talk-ai-agents/
Let’s Get Agentic: LangChain and LlamaIndex Talk AI Agents

https://github.com/langflow-ai/langflow
Langflow is a visual framework for building multi-agent and RAG applications. It's open-source, Python-powered, fully customizable, model and vector store agnostic.


2024Jul16

Shapeshift: Semantically map JSON objects using key-level vector embeddings
https://news.ycombinator.com/item?id=40972130
https://github.com/rectanglehq/Shapeshift


2024Jul22

txtai: Open-source vector search and RAG for minimalists (neuml.github.io)
https://news.ycombinator.com/item?id=41024362
https://neuml.github.io/txtai/


2024Jul26

https://news.ycombinator.com/item?id=41069909
Launch HN: Undermind (YC S24) – AI agent for discovering scientific papers


2024Aug06

https://www.nature.com/articles/s41467-024-50955-0
Accurate prediction of protein function using statistics-informed graph networks | Nature Communications


2024Aug08

https://www.nature.com/articles/s42256-024-00873-z
Published: 05 August 2024

Integrated structure prediction of protein–protein docking with experimental restraints using ColabDock

Shihao Feng, Zhenyu Chen, Chengwei Zhang, Yuhao Xie, Sergey Ovchinnikov, Yi Qin Gao & Sirui Liu 

Nature Machine Intelligence (2024)
details
A preprint version of the article is available at bioRxiv.
https://doi.org/10.1101/2023.07.04.547599 ->
https://www.biorxiv.org/content/10.1101/2023.07.04.547599v1
https://www.biorxiv.org/content/10.1101/2023.07.04.547599v1.full.pdf

https://neurosciencenews.com/ai-genetics-dna-decosing-27521/
AI Helps Decode the Language of DNA - Neuroscience News
->
https://www.nature.com/articles/s42256-024-00872-0
Published: 23 July 2024
DNA language model GROVER learns sequence context in the human genome
Melissa Sanabria, Jonas Hirsch, Pierre M. Joubert & Anna R. Poetsch
Nature Machine Intelligence (2024)Cite this article

https://www.nature.com/articles/s42256-024-00872-0.pdf

https://www.nature.com/articles/s41467-024-50955-0
Open access
Published: 04 August 2024
Accurate prediction of protein function using statistics-informed graph networks
Yaan J. Jang, Qi-Qi Qin, Si-Yu Huang, Arun T. John Peter, Xue-Ming Ding & Benoît Kornmann
Nature Communications volume 15, Article number: 6601 (2024)
https://www.nature.com/articles/s41467-024-50955-0.pdf

Similar content being viewed by others
LM-GVP: an extensible sequence and structure informed deep learning framework for protein property prediction
Article Open access 27 April 2022
Structure-based protein function prediction using graph convolutional networks
Article Open access 26 May 2021
Physicochemical graph neural network for learning protein–ligand interaction fingerprints from sequence data
Article 17 June 2024

2024Aug17
https://news.ycombinator.com/item?id=41268929
What Is a Knowledge Graph? (neo4j.com)
https://neo4j.com/blog/what-is-knowledge-graph/
7546527 Aug 17 11:57 dokumen.pub_building-knowledge-graphs-a-practitioners-guide-1nbsped-1098127102-9781098127107-u-3912848.pdf

2024Aug24
https://towardsdatascience.com/graph-rag-a-conceptual-introduction-41cd0d431375
Graph RAG — A Conceptual Introduction
Graph RAG answers the big questions where text embeddings won’t help you.
Jakob Pörschmann
Towards Data Science


From 2024Aug13 (diigo)

How to Use Hybrid Search for Better LLM RAG Retrieval | by Dr. Leon Eversberg | Aug, 2024 | Towards Data Science
LLM
RAG
https://towardsdatascience.com/how-to-use-hybrid-search-for-better-llm-rag-retrieval-032f66810ebe

https://towardsdatascience.com/ai-agents-from-concepts-to-practical-implementation-in-python-fb26789b1560
LLM
agent
Python
https://towardsdatascience.com/ai-agents-from-concepts-to-practical-implementation-in-python-fb26789b1560

Crab Framework Released: An AI Framework for Building LLM Agent Benchmark Environments in a Python-Centric Way - MarkTechPost
LLM
agent
https://www.marktechpost.com/2024/08/10/crab-framework-released-an-ai-framework-for-building-llm-agent-benchmark-environments-in-a-python-centric-way/

...
https://github.com/camel-ai/crab
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/

Knowledge Graph-based Agent With Llama 3.1, NVIDIA NIM & LangChain
LLM
knowledge
graph
LangChain
https://neo4j.com/developer-blog/knowledge-graph-llama-nvidia-langchain/
Build a Knowledge Graph-based Agent With Llama 3.1, NVIDIA NIM, and LangChain
Tomaž Bratanič Aug 08
...
https://github.com/tomasonjo/blogs/blob/master/llm/nvidia_neo4j_langchain.ipynb

https://news.ycombinator.com/item?id=41224557

Creating the largest protein-protein interaction dataset in the world (owlposting.com)

https://www.owlposting.com/p/creating-the-largest-protein-protein


2024Oct15

https://news.ycombinator.com/item?id=39348902
GeneGPT, a tool-augmented LLM for bioinformatics (github.com/ncbi)
https://github.com/ncbi/GeneGPT

(HN comment ... it's a prompt and an OpenAI API call.)


The following blocks are added on 2024Dec11 from postings over the past month/s


https://news.ycombinator.com/item?id=41985176
Vector databases are the wrong abstraction (timescale.com)
https://www.timescale.com/blog/vector-databases-are-the-wrong-abstraction/

in comments :
...
avthar
(Post co-author)
...
[1]: https://github.com/timescale/pgvectorscale
[2]: https://www.timescale.com/blog/how-we-made-postgresql-as-fast-as-pinecone-for-vector-data/

...
https://www.elastic.co/search-labs/blog/semantic-search-simplified-semantic-text


https://news.ycombinator.com/item?id=42100819
Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG (github.com/bhavnicksm)
by bhavnicksm 17 hours ago
I built Chonkie because I was tired of rewriting chunking code for RAG applications. Existing libraries were either too bloated (80MB+) or too basic, with no middle ground.


https://news.ycombinator.com/item?id=42174829
Show HN: FastGraphRAG – Better RAG using good old PageRank (github.com/circlemind-ai)

... Fast GraphRAG, an open-source RAG approach that leverages knowledge graphs and the 25 years old PageRank for better information retrieval and reasoning.
...
... https://circlemind.co and see our code at https://github.com/circlemind-ai/fast-graphrag

[1] https://blog.google/products/search/introducing-knowledge-graph-things-not/
[3] Similarly to Microsoft’s GraphRAG: https://github.com/microsoft/graphrag
[4] Similarly to OSU’s HippoRAG: https://github.com/OSU-NLP-Group/HippoRAG


https://news.ycombinator.com/item?id=42314212
Show HN: Open-Source Colab Notebooks to Implement Advanced RAG Techniques (github.com/athina-ai)
by hbamoria 5 hours ago
... our team (Athina AI) has released Open-Source Advanced RAG Cookbooks.
This is a collection of ready-to-run Google Colab notebooks featuring the most commonly implemented techniques.

https://github.com/athina-ai/rag-cookbooks
This repository contains various advanced techniques for Retrieval-Augmented Generation (RAG) systems.

https://github.com/athina-ai/rag-cookbooks/blob/main/hybrid_rag.ipynb


https://news.ycombinator.com/item?id=42210689
Autoflow, a Graph RAG based and conversational knowledge base tool (github.com/pingcap)

https://github.com/pingcap/autoflow
pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tidb.ai

search : crewai alternatives

https://topai.tools/alternatives/crewai

CrewAI is an innovative AI agent framework that simplifies creation of complex automations using pre-built models and agents, or custom development with open source tools. Engaged community for sharing and support. Copyrighted by CrewAI Inc., 2024.

The best CrewAI alternative is BrainSoup. Other great alternatives are Open agent studio and AI Course Creator - AcademyOcean. On this list your will find a total of 16 free CrewAI alternatives and paid ones.

https://microsoft.github.io/autogen/0.2/

https://hn.algolia.com/?dateRange=all&page=1&prefix=true&query=crewai&sort=byPopularity&type=story

https://e2b.dev/blog/crewai-vs-autogen-for-code-execution-ai-agents


https://news.ycombinator.com/item?id=41202064
Show HN: Nous – Open-Source Agent Framework with Autonomous, SWE Agents, WebUI (github.com/trafficguard)
https://github.com/TrafficGuard/sophia
TypeScript AI platform with AI chat, Autonomous agents, Software developer agents, chatbots and more

@Don-Isdale Don-Isdale added phase:discussion prospective-feature Discussion of a feature which might be added - is not yet confirmed as a requirement labels Jul 10, 2024
@Don-Isdale Don-Isdale changed the title Discussion of future options for natural language search - current tools for LLM / Graph / LangChain Background literature review / discussion of future options for natural language search - current tools for LLM / Graph / LangChain Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
phase:discussion prospective-feature Discussion of a feature which might be added - is not yet confirmed as a requirement
Projects
None yet
Development

No branches or pull requests

1 participant