async mongo document loader #4285

saginawj · 2023-05-07T13:03:07Z

This Mongo Document Loader:

Requires a new package: motor, which is the official async driver for MongoDB
Accepts connection_string, db_name, collection_name as inputs
Returns a list of Document objects asynchronously from MongoDB
metadata={'database': '[db_name]', 'collection': '[collection_name]'})
exposes the sync loader. Load method while using asyncio/motor to retrieve the MongoDB docs

saginawj · 2023-05-07T13:07:29Z

Note: the checks are failing since Motor would be a new package.

@skcoirz would love to have you review this PR given our previous convos on async loaders. Thanks!

skcoirz

shared some ideas! :)

tests/integration_tests/document_loaders/test_mongodb.py

langchain/document_loaders/mongodb.py

skcoirz

shared some ideas.

tests/integration_tests/document_loaders/test_mongodb.py

skcoirz

yeah overall lgtm. Maybe move the non-CI-supported test to an example doc or just remove it? How do you think?

tests/integration_tests/document_loaders/test_mongodb.py

skcoirz

yeah overall lgtm. Maybe move the non-CI-supported test to an example doc or just remove it? How do you think?

skcoirz

yeah overall lgtm. Maybe move the non-CI-supported test to an example doc or just remove it? How do you think?

saginawj · 2023-05-10T08:58:25Z

@skcoirz thanks for all the feedback! I can go ahead and resolve the convos above as it seems we're in alignment here.

One question I did have: the build fails because I added the motor package which is not in the core langchain distro at the point. I had the option of adding another 'stub' package to get the mypy job to pass, but I didn't want to add more packages to the distro just for that purpose.

skcoirz · 2023-05-10T16:04:25Z

AsyncIOMotorClient is used to connect mongodb. It’s a part of the main workflow here.
Option 1: It makes sense for this PR to add it to langchain as an optional dependency. (most toolkits are optional dependency) To make mypy happy, you can move the import inside of the function with a wrapped by a try-catch for import error. (e.g langchain/llms/huggingface_endpoint.py)

@skcoirz thanks for all the feedback! I can go ahead and resolve the convos above as it seems we're in alignment here.

One question I did have: the build fails because I added the motor package which is not in the core langchain distro at the point. I had the option of adding another 'stub' package to get the mypy job to pass, but I didn't want to add more packages to the distro just for that purpose.

saginawj · 2023-05-11T09:09:09Z

@skcoirz I actually copied this approach locally but still get the mypy issue. I didn't want to push another change bc you had mentioned we should be good to go on this PR, but happy to continue to tinker with this if we are waiting on the mypy to pass

saginawj · 2023-05-12T15:10:54Z

To resolve the mypy issue, I read their docs about either installing stubs or ignoring the import line. I saw that many other modules in Langchain use type: ignore so I went this route.

I do see that there's an issue with poetry. Lock. I figure the CI team would want to take a look at this vs. having me tinker with it (I use motor, the error mentions the need for motor, so likely just a versioning thing).

# Fix Homepage Typo ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested... not sure

@eyurtsev

) # Docs and code review fixes for Docugami DataLoader 1. I noticed a couple of hyperlinks that are not loading in the langchain docs (I guess need explicit anchor tags). Added those. 2. In code review @eyurtsev had a [suggestion](langchain-ai#4727 (comment)) to allow string paths. Turns out just updating the type works (I tested locally with string paths). # Pre-submission checks I ran `make lint` and `make tests` successfully. --------- Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>

# Update deployments doc with langcorn API server API server example ```python from fastapi import FastAPI from langcorn import create_service app: FastAPI = create_service( "examples.ex1:chain", "examples.ex2:chain", "examples.ex3:chain", "examples.ex4:sequential_chain", "examples.ex5:conversation", "examples.ex6:conversation_with_summary", ) ``` More examples: https://github.com/msoedov/langcorn/tree/main/examples Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>

…angchain-ai#2361) It's currently not possible to change the `TEMPLATE_TOOL_RESPONSE` prompt for ConversationalChatAgent, this PR changes that. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>

Co-authored-by: Ali Mirlou <alimirlou@gmail.com>

# Add generic document loader * This PR adds a generic document loader which can assemble a loader from a blob loader and a parser * Adds a registry for parsers * Populate registry with a default mimetype based parser ## Expected changes - Parsing involves loading content via IO so can be sped up via: * Threading in sync * Async - The actual parsing logic may be computatinoally involved: may need to figure out to add multi-processing support - May want to add suffix based parser since suffixes are easier to specify in comparison to mime types ## Before submitting No notebooks yet, we first need to get a few of the basic parsers up (prior to advertising the interface)

# Add bs4 html parser * Some minor refactors * Extract the bs4 html parsing code from the bs html loader * Move some tests from integration tests to unit tests

Co-authored-by: BenSchZA <BenSchZA@users.noreply.github.com>

Co-authored-by: Daniel Chalef <daniel.chalef@private.org> Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>

@hwchase17

…langchain-ai#4389) # Documentation for Azure OpenAI embeddings model - OPENAI_API_VERSION environment variable is needed for the endpoint - The constructor does not work with model, it works with deployment. I fixed it in the notebook. (This is my first contribution) ## Who can review? @hwchase17 @agola Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>

# Added another helpful way for developers who want to set OpenAI API Key dynamically Previous methods like exporting environment variables are good for project-wide settings. But many use cases need to assign API keys dynamically, recently. ```python from langchain.llms import OpenAI llm = OpenAI(openai_api_key="OPENAI_API_KEY") ``` ## Before submitting ```bash export OPENAI_API_KEY="..." ``` Or, ```python import os os.environ["OPENAI_API_KEY"] = "..." ``` <hr> Thank you. Cheers, Bongsang

@hwchase17

#docs: text splitters improvements Changes are only in the Jupyter notebooks. - added links to the source packages and a short description of these packages - removed " Text Splitters" suffixes from the TOC elements (they made the list of the text splitters messy) - moved text splitters, based on the length function into a separate list. They can be mixed with any classes from the "Text Splitters", so it is a different classification. ## Who can review? @hwchase17 - project lead @eyurtsev @vowelparrot NOTE: please, check out the results of the `Python code` text splitter example (text_splitters/examples/python.ipynb). It looks suboptimal.

Co-authored-by: Jerry Luan <xmaswillyou@gmail.com>

Co-authored-by: Jiaxin Shan <seedjeffwan@gmail.com>

Co-authored-by: Matthias Samwald <samwald@gmx.at>

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>

…loader (langchain-ai#5466) # Adds ability to specify credentials when using Google BigQuery as a data loader Fixes langchain-ai#5465 . Adds ability to set credentials which must be of the `google.auth.credentials.Credentials` type. This argument is optional and will default to `None. Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>

…the class BooleanOutputParser (langchain-ai#5397) when the LLMs output 'yes|no'，BooleanOutputParser can parse it to 'True|False', fix the ValueError in parse().  Fixes # (issue) langchain-ai#5396 langchain-ai#5396 --------- Co-authored-by: gaofeng27692 <gaofeng27692@hundsun.com>

# Added support for modifying the number of threads in the GPT4All model I have added the capability to modify the number of threads used by the GPT4All model. This allows users to adjust the model's parallel processing capabilities based on their specific requirements. ## Changes Made - Updated the `validate_environment` method to set the number of threads for the GPT4All model using the `values["n_threads"]` parameter from the `GPT4All` class constructor. ## Context Useful in scenarios where users want to optimize the model's performance by leveraging multi-threading capabilities. Please note that the `n_threads` parameter was included in the `GPT4All` class constructor but was previously unused. This change ensures that the specified number of threads is utilized by the model . ## Dependencies There are no new dependencies introduced by this change. It only utilizes existing functionality provided by the GPT4All package. ## Testing Since this is a minor change testing is not required. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>

# Allow for async use of SelfAskWithSearchChain Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>

…bject (langchain-ai#5321) This PR adds a new method `from_es_connection` to the `ElasticsearchEmbeddings` class allowing users to use Elasticsearch clusters outside of Elastic Cloud. Users can create an Elasticsearch Client object and pass that to the new function. The returned object is identical to the one returned by calling `from_credentials` ``` # Create Elasticsearch connection es_connection = Elasticsearch( hosts=['https://es_cluster_url:port'], basic_auth=('user', 'password') ) # Instantiate ElasticsearchEmbeddings using es_connection embeddings = ElasticsearchEmbeddings.from_es_connection( model_id, es_connection, ) ``` I also added examples to the elasticsearch jupyter notebook Fixes # langchain-ai#5239 --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>

# SQLite-backed Entity Memory Following the initiative of langchain-ai#2397 I think it would be helpful to be able to persist Entity Memory on disk by default Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>

Co-authored-by: David Revillas <26328973+r3v1@users.noreply.github.com>

@dev2049

# Support Qdrant filters Qdrant has an [extensive filtering system](https://qdrant.tech/documentation/concepts/filtering/) with rich type support. This PR makes it possible to use the filters in Langchain by passing an additional param to both the `similarity_search_with_score` and `similarity_search` methods. ## Who can review? @dev2049 @hwchase17 --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>

Co-authored-by: Tom Piaggio <tomaspiaggio@google.com> Co-authored-by: scafati98 <jupyter@matchingengine.us-central1-a.c.scafati-joonix.internal> Co-authored-by: scafati98 <scafatieugenio@gmail.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>

leo-gan · 2023-09-13T01:44:09Z

@saginawj Hi , could you, please, resolve the merging issues? After that ping me and I push this PR for the review. Thanks!

vercel · 2023-09-13T10:49:28Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
langchain	❌ Failed (Inspect)			Sep 14, 2023 0:20am

saginawj · 2023-09-13T10:55:48Z

@saginawj Hi , could you, please, resolve the merging issues? After that ping me and I push this PR for the review. Thanks!

Hi @leo-gan! I resolved the merge issue (relating to the install of motor, the async engine for MongoDB). Note there are still lint/test failures that I can take a look at if needed (related to the motor install), but per your request I wanted to notify you after fixing the merge issue.

…angchain into mongo-document-loader

leo-gan · 2023-09-14T15:14:55Z

@saginawj Please, check the merging files. It seems, there are too many files in this PR. Something went wrong?

saginawj · 2023-09-15T14:29:26Z

@saginawj Please, check the merging files. It seems, there are too many files in this PR. Something went wrong?

@leo-gan yes since this PR was from over 4 months ago, a lot has obviously changed. I actually created a new PR with this functionality here. Might be easier to close this old one (though good to reference given the comments on async) and use the new one.

Note that I have it running locally and ran the lint/format/test, but seem to be running into a few issues with some of the new rules. Will work through that. Thanks!

leo-gan · 2023-09-15T15:34:43Z

Closing, because #10645 was created instead.
@saginawj Thank you for taking care!

leo-gan · 2023-09-20T17:17:26Z

Closing, because #10645 was created instead.
@saginawj Thank you for taking care!

@leo-gan

- **Description:** A Document Loader for MongoDB - **Issue:** n/a - **Dependencies:** Motor, the async driver for MongoDB - **Tag maintainer:** n/a - **Twitter handle:** pigpenblue Note that an initial mongodb document loader was created 4 months ago, but the [PR ](#4285 never pulled in. @leo-gan had commented on that PR, but given it is extremely far behind the master branch and a ton has changed in Langchain since then (including repo name and structure), I rewrote the branch and issued a new PR with the expectation that the old one can be closed. Please reference that old PR for comments/context, but it can be closed in favor of this one. Thanks! --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>

@baskaryan

* Support using async callback handlers with sync callback manager (langchain-ai#10945) The current behaviour just calls the handler without awaiting the coroutine, which results in exceptions/warnings, and obviously doesn't actually execute whatever the callback handler does  * LangServe (langchain-ai#11046) Adds LangServe package * Integrate Runnables with Fast API creating Server and a RemoteRunnable client * Support multiple runnables for a given server * Support sync/async/batch/abatch/stream/astream/astream_log on the client side (using async implementations on server) * Adds validation using annotations (relying on pydantic under the hood) -- this still has some rough edges -- e.g., open api docs do NOT generate correctly at the moment * Uses pydantic v1 namespace Known issues: type translation code doesn't handle a lot of types (e.g., TypedDicts) --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> * Add input/output schemas to runnables (langchain-ai#11063) This adds `input_schema` and `output_schema` properties to all runnables, which are Pydantic models for the input and output types respectively. These are inferred from the structure of the Runnable as much as possible, the only manual typing needed is - optionally add type hints to lambdas (which get translated to input/output schemas) - optionally add type hint to RunnablePassthrough These schemas can then be used to create JSON Schema descriptions of input and output types, see the tests - [x] Ensure no InputType and OutputType in our classes use abstract base classes (replace with union of subclasses) - [x] Implement in BaseChain and LLMChain - [x] Implement in RunnableBranch - [x] Implement in RunnableBinding, RunnableMap, RunnablePassthrough, RunnableEach, RunnableRouter - [x] Implement in LLM, Prompt, Chat Model, Output Parser, Retriever - [x] Implement in RunnableLambda from function signature - [x] Implement in Tool  * Expose loads and dumps in load namespace * Async support for OpenAIFunctionsAgentOutputParser (langchain-ai#11140) * milvus collections (langchain-ai#11148) Description: There was no information about Milvus collections in the documentation, so I am adding that. Maintainer: @eyurtsev * Xata chat memory FIX (langchain-ai#11145) - **Description:** Changed data type from `text` to `json` in xata for improved performance. Also corrected the `additionalKwargs` key in the `messages()` function to `additional_kwargs` to adhere to `BaseMessage` requirements. - **Issue:** The Chathisroty.messages() will return {} of `additional_kwargs`, as the name is wrong for `additionalKwargs` . - **Dependencies:** N/A - **Tag maintainer:** N/A - **Twitter handle:** N/A My PR is passing linting and testing before submitting. * Fixed Typo Error in Update get_started.mdx file by addressing a minor typographical error. (langchain-ai#11154) Fixed Typo Error in Update get_started.mdx file by addressing a minor typographical error. This improvement enhances the readability and correctness of the notebook, making it easier for users to understand and follow the demonstration. The commit aims to maintain the quality and accuracy of the content within the repository. please review the change at your convenience. @baskaryan , @hwaking * Implement better reprs for Runnables * x * x * x * x * Fix stop key of TextGen. (langchain-ai#11109) The key of stopping strings used in text-generation-webui api is [`stopping_strings`](https://github.com/oobabooga/text-generation-webui/blob/main/api-examples/api-example.py#L51), not `stop`.  * LangServe: Clean up init files (langchain-ai#11174) Clean up init files * mypy * Lint * Lint * Expose lc_id as a classmethod (langchain-ai#11176) * Expose LC id as a class method * User should not need to know that the last part of the id is the class name * Update Bedrock service name to "bedrock-runtime" and model identifiers (langchain-ai#11161) - **Description:** Bedrock updated boto service name to "bedrock-runtime" for the InvokeModel and InvokeModelWithResponseStream APIs. This update also includes new model identifiers for Titan text, embedding and Anthropic. Co-authored-by: Mani Kumar Adari <maniadar@amazon.com> * LangServe: Add release workflow (langchain-ai#11178) Add release workflow to langserve * LangServe: Update langchain requirement for publishing (langchain-ai#11186) Update langchain requirement for publishing * temporarily skip embedding empty string test (langchain-ai#11187) * Fix anthropic secret key when passed in via init (langchain-ai#11185) Fixes anthropic secret key when passed via init langchain-ai#11182 * add anthropic scheduled tests and unit tests (langchain-ai#11188) * Rm additional file check for scheduled tests (langchain-ai#11192) cc @obi1kenobi Causing issues with GHA creds https://github.com/langchain-ai/langchain/actions/runs/6342674950/job/17228926776 * Add source metadata to OutlookMessageLoader (langchain-ai#11183) Description: Add "source" metadata to OutlookMessageLoader This pull request adds the "source" metadata to the OutlookMessageLoader class in the load method. The "source" metadata is required when indexing with RecordManager in order to sync the index documents with a source. Issue: None Dependencies: None Twitter handle: @ATelders Co-authored-by: Arthur Telders <arthur.telders@roquette.com> * [OpenSearch] Add Self Query Retriever Support to OpenSearch (langchain-ai#11184) ### Description Add Self Query Retriever Support to OpenSearch ### Maintainers @rlancemartin, @eyurtsev, @navneet1v ### Twitter Handle @OpenSearchProj Signed-off-by: Naveen Tatikonda <navtat@amazon.com> * [ElasticsearchStore] Improve migration text to ElasticsearchStore (langchain-ai#11158) We noticed that as we have been moving developers to the new `ElasticsearchStore` implementation, we want to keep the ElasticVectorSearch class still available as developers transition slowly to the new store. To speed up this process, I updated the blurb giving them a better recommendation of why they should use ElasticsearchStore. * update docs nav (langchain-ai#11146) * Add langserve version (langchain-ai#11195) Add langserve version * [Feat] Add optional client-side encryption to DynamoDB chat history memory (langchain-ai#11115) **Description:** Added optional client-side encryption to the Amazon DynamoDB chat history memory with an AWS KMS Key ID using the [AWS Database Encryption SDK for Python](https://docs.aws.amazon.com/database-encryption-sdk/latest/devguide/python.html) **Issue:** langchain-ai#7886 **Dependencies:** [dynamodb-encryption-sdk](https://pypi.org/project/dynamodb-encryption-sdk/) **Tag maintainer:** @hwchase17 **Twitter handle:** [@jplock](https://twitter.com/jplock/) --------- Co-authored-by: Bagatur <baskaryan@gmail.com> * Shared Executor (langchain-ai#11028) * LLMonitor Callback handler: fix bug (langchain-ai#11128) Here is a small bug fix for the LLMonitor callback handler. I've also added user identification capabilities. * Add support for MongoDB Atlas $vectorSearch vector search (langchain-ai#11139) Adds support for the `$vectorSearch` operator for MongoDBAtlasVectorSearch, which was announced at .Local London (September 26th, 2023). This change maintains breaks compatibility support for the existing `$search` operator used by the original integration (langchain-ai#5338) due to incompatibilities in the Atlas search implementations. --------- Co-authored-by: Bagatur <baskaryan@gmail.com> * add from_existing_graph to neo4j vector (langchain-ai#11124) This PR adds the option to create a Neo4jvector instance from existing graph, which embeds existing text in the database and creates relevant indices. * Add `add_graph_documents` support for FalkorDBGraph (langchain-ai#11122) Adding `add_graph_documents` support for FalkorDBGraph and extending the `Neo4JGraph` api so it can support `cypher.py` * FIx eval prompt (langchain-ai#11087) **Description:** fixes a common typo in some of the eval criteria. * Expanded version range for networkx, fixed sample notebook (langchain-ai#11094) ## Description Expanded the upper bound for `networkx` dependency to allow installation of latest stable version. Tested the included sample notebook with version 3.1, and all steps ran successfully. --------- Co-authored-by: Bagatur <baskaryan@gmail.com> * docs: Mendable Search Improvements (langchain-ai#11199) Improvements to the Mendable UI, more accurate responses, and bug fixes. * Change type annotations from LLMChain to Chain in MultiPromptChain (langchain-ai#11082) - **Description:** The types of 'destination_chains' and 'default_chain' in 'MultiPromptChain' were changed from 'LLMChain' to 'Chain'. and removed variables declared overlapping with the parent class - **Issue:** When a class that inherits only Chain and not LLMChain, such as 'SequentialChain' or 'RetrievalQA', is entered in 'destination_chains' and 'default_chain', a pydantic validation error is raised. - - codes ``` retrieval_chain = ConversationalRetrievalChain( retriever=doc_retriever, combine_docs_chain=combine_docs_chain, question_generator=question_gen_chain, ) destination_chains = { 'retrieval': retrieval_chain, } main_chain = MultiPromptChain( router_chain=router_chain, destination_chains=destination_chains, default_chain=default_chain, verbose=True, ) ``` ✅ `make format`, `make lint` and `make test` * fix: short-circuit black and mypy calls when no changes made (langchain-ai#11051) Both black and mypy expect a list of files or directories as input. As-is the Makefile computes a list files changed relative to the last commit; these are passed to black and mypy in the `format_diff` and `lint_diff` targets. This is done by way of the Makefile variable `PYTHON_FILES`. This is to save time by skipping running mypy and black over the whole source tree. When no changes have been made, this variable is empty, so the call to black (and mypy) lacks input files. The call exits with error causing the Makefile target to error out with: ```bash $ make format_diff poetry run black Usage: black [OPTIONS] SRC ... One of 'SRC' or 'code' is required. make: *** [format_diff] Error 1 ``` This is unexpected and undesirable, as the naive caller (that's me! 😄 ) will think something else is wrong. This commit smooths over this by short circuiting when `PYTHON_FILES` is empty. * Callback integration for Trubrics (langchain-ai#11059) After contributing to some examples in the [langsmith-cookbook](https://github.com/langchain-ai/langsmith-cookbook) with @hinthornw, here is a PR that adds a callback handler to use LangChain with [Trubrics](https://github.com/trubrics/trubrics-sdk). * Support add_embeddings for opensearch (langchain-ai#11050) - **Description:** - Make running integration test for opensearch easy - Provide a way to use different text for embedding: refer to langchain-ai#11002 for more of the use case and design decision. - **Issue:** N/A - **Dependencies:** None other than the existing ones. * chore: add support for TypeScript code splitting (langchain-ai#11160) - **Description:** Adds typescript language to `TextSplitter` --------- Co-authored-by: Jacob Lee <jacoblee93@gmail.com> * fix trubrics lint issue (langchain-ai#11202) * SearchApi integration (langchain-ai#11023) Based on the customers' requests for native langchain integration, SearchApi is ready to invest in AI and LLM space, especially in open-source development. - This is our initial PR and later we want to improve it based on customers' and langchain users' feedback. Most likely changes will affect how the final results string is being built. - We are creating similar native integration in Python and JavaScript. - The next plan is to integrate into Java, Ruby, Go, and others. - Feel free to assign @SebastjanPrachovskij as a main reviewer for any SearchApi-related searches. We will be glad to help and support langchain development. * Synthetic Data generation (langchain-ai#9472) --------- Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com> * LangServe: Relax requirements (langchain-ai#11198) Relax requirements * Add last_edited_time and created_time props to NotionDBLoader (langchain-ai#11020) # Description Adds logic for NotionDBLoader to correctly populate `last_edited_time` and `created_time` fields from [page properties](https://developers.notion.com/reference/page#property-value-object). There are no relevant tests for this code to be updated. --------- Co-authored-by: Bagatur <baskaryan@gmail.com> * `LlamaCppEmbeddings`: adds `verbose` parameter, similar to `llms.LlamaCpp` class (langchain-ai#11038) ## Description As of now, when instantiating and during inference, `LlamaCppEmbeddings` outputs (a lot of) verbose when controlled from Langchain binding - it is a bit annoying when computing the embeddings of long documents, for instance. This PR adds `verbose` for `LlamaCppEmbeddings` objects to be able **not** to print the verbose of the model to `stderr`. It is natively supported by `llama-cpp-python` and directly passed to the library – the PR is hence very small. The value of `verbose` is `True` by default, following the way it is defined in [`LlamaCpp` (`llamacpp.py` #L136-L137)](https://github.com/langchain-ai/langchain/blob/c87e9fb2ce0ae617e3b2edde52421c80adef54cc/libs/langchain/langchain/llms/llamacpp.py#L136-L137) ## Issue _No issue linked_ ## Dependencies _No additional dependency needed_ ## To see it in action ```python from langchain.embeddings import LlamaCppEmbeddings MODEL_PATH = "<path_to_gguf_file>" if __name__ == "__main__": llm_embeddings = LlamaCppEmbeddings( model_path=MODEL_PATH, n_gpu_layers=1, n_batch=512, n_ctx=2048, f16_kv=True, verbose=False, ) ``` Co-authored-by: Bagatur <baskaryan@gmail.com> * Support new version of tiktoken that are working with langchain (tag "^0.3.2" => "">=0.3.2,<0.6.0" and python "^3.9" =>">=3.9") (langchain-ai#11006) - **Description:** be able to use langchain with other version than tiktoken 0.3.3 i.e 0.5.1 - **Issue:** cannot installed the conda-forge version since it applied all optional dependency: conda-forge/langchain-feedstock#85 replace "^0.3.2" by "">=0.3.2,<0.6.0" and "^3.9" by python=">=3.9" Tested with python 3.10, langchain=0.0.288 and tiktoken==0.5.0 --------- Co-authored-by: Bagatur <baskaryan@gmail.com> * Typo fix to MathpixPDFLoader - changed processed_file_format default … (langchain-ai#10960) …from mmd to md. langchain-ai#7282  Co-authored-by: jare0530 <7915+jare0530@users.noreply.ghe.oculus-rep.com> * Fix web-base loader (langchain-ai#11135) Fix initialization langchain-ai#11095 * Updated `LocalAIEmbeddings` docstring to better explain why `openai` (langchain-ai#10946) Fixes my misgivings in langchain-ai#10912 * Add support for project metadata in run_on_dataset (langchain-ai#11200) * Add from_embeddings for opensearch (langchain-ai#10957) * Skip for py3.8 * Skip in py3.8 * skip more * Even more * Enable creating Tools from any Runnable * Fix invocation * Lint * Lint * Add RunnableGenerator * Add tests * Lint * Add a streaming json parser * Implement str one * WIP Add tests§ * Implement diff * Implement diff * Backwards compat * Clean warnings: replace type with isinstance and fix syntax (langchain-ai#11219) Clean warnings: replace type with `isinstance` and fix on notebook syntax syntax * Add async tests and comments * Update fireworks features (langchain-ai#11205) Description * Update fireworks feature on web page Issue - Not applicable Dependencies - None Tag maintainer - @baskaryan * mongodb doc loader init (langchain-ai#10645) - **Description:** A Document Loader for MongoDB - **Issue:** n/a - **Dependencies:** Motor, the async driver for MongoDB - **Tag maintainer:** n/a - **Twitter handle:** pigpenblue Note that an initial mongodb document loader was created 4 months ago, but the [PR ](langchain-ai#4285 never pulled in. @leo-gan had commented on that PR, but given it is extremely far behind the master branch and a ton has changed in Langchain since then (including repo name and structure), I rewrote the branch and issued a new PR with the expectation that the old one can be closed. Please reference that old PR for comments/context, but it can be closed in favor of this one. Thanks! --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> * Suppress warnings in interactive env that stem from tab completion (langchain-ai#11190) Suppress warnings in interactive environments that can arise from users relying on tab completion (without even using deprecated modules). jupyter seems to filter warnings by default (at least for me), but ipython surfaces them all * OpenAI gpt-3.5-turbo-instruct cost information (langchain-ai#11218) Added pricing info for `gpt-3.5-turbo-instruct` for OpenAI and Azure OpenAI. Co-authored-by: Attila Tőkés <atokes@rws.com> * Fix typo in gradient.ipynb (langchain-ai#11206) Enviroment -> Environment  * Make test deterministic * bump 305 (langchain-ai#11224) * Using langchain input types (langchain-ai#11204) Using langchain input type * Make tests stricter, remove old code, fix up pydantic import when using v2 (langchain-ai#11231) Make tests stricter, remove old code, fix up pydantic import when using v2 (langchain-ai#11231) * Combine with existing json output parsers * Lint * Keep exceptions when not in streaming mode * Update json.py Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> * Update json.py Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> * Lint * Remove flawed test - It is not possible to access properties on classes, only on instances, therefore this test is not something we can implement * Implement RunnablePassthrough.assign(...) (langchain-ai#11222) Passes through dict input and assigns additional keys  * Add type to message chunks (langchain-ai#11232) * Ignore aadd (langchain-ai#11235) * fix code injection vuln (langchain-ai#11233) - **Description:** Fix a code injection vuln by adding one more keyword into the filtering list - **Issue:** N/A - **Dependencies:** N/A - **Tag maintainer:** - **Twitter handle:** Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> * Bump deps in langserve (langchain-ai#11234) Bump deps in langserve lockfile * Update DeepSparse LLM (langchain-ai#11236) **Description:** Adds streaming and many more sampling parameters to the DeepSparse interface --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> * docs: `integrations/memory` consistency (langchain-ai#10255) - updated titles and descriptions of the `integrations/memory` notebooks into consistent and laconic format; - removed `docs/extras/integrations/memory/motorhead_memory_managed.ipynb` file as a duplicate of the `docs/extras/integrations/memory/motorhead_memory.ipynb`; - added `integrations/providers` Integration Cards for `dynamodb`, `motorhead`. - updated `integrations/providers/redis.mdx` with links - renamed several notebooks; updated `vercel.json` to reroute new names. * docs: `document_transformers` consistency (langchain-ai#10467) - Updated `document_transformers` examples: titles, descriptions, links - Added `integrations/providers` for missed document_transformers * docs: updated `YouTube` and `tutorial` video links (langchain-ai#10897) updated `YouTube` and `tutorial` videos with new links. Removed couple of duplicates. Reordered several links by view counters Some formatting: emphasized the names of products * minor fix: remove redundant code from OpenAIFunctionsAgent (langchain-ai#11245) minor fix: remove redundant code from OpenAIFunctionsAgent (langchain-ai#11245) * rename repo namespace to langchain-ai (langchain-ai#11259) ### Description renamed several repository links from `hwchase17` to `langchain-ai`. ### Why I discovered that the README file in the devcontainer contains an old repository name, so I took the opportunity to rename the old repository name in all files within the repository, excluding those that do not require changes. ### Dependencies none ### Tag maintainer @baskaryan ### Twitter handle [kzk_maeda](https://twitter.com/kzk_maeda) * Fix typo in docstring (langchain-ai#11256) Description : Remove meaningless 's' in docstring * Create new RunnableSerializable class in preparation for configurable runnables - Also move RunnableBranch to its own file * Lint * Lint * Lint * Lint * Move RunnableWithFallbacks to its own file * Lint * Lint * Lint * Update quickstart.mdx to add backtick after `ChatMessages` (langchain-ai#11241) While going through the documentation I found this small issue and wanted to contribute!  * Remove extra spaces (langchain-ai#11283) ### Description When I was reading the document, I found that some examples had extra spaces and violated "Unexpected spaces around keyword / parameter equals (E251)" in pep8. I removed these extra spaces. ### Tag maintainer @eyurtsev ### Twitter handle [billvsme](https://twitter.com/billvsme) * Add base docker image and ci script for building and pushing (langchain-ai#10927) * bump 306 (langchain-ai#11289) * Small changes to runnable docs (langchain-ai#11293)  * Add Google GitHub Action creds file to gitignore. (langchain-ai#11296) Should resolve the issue here: https://github.com/langchain-ai/langchain/actions/runs/6342767671/job/17229204508#step:7:36 After this merges, we can revert langchain-ai#11192 * Add pending deprecation warning (langchain-ai#11133) This PR uses 2 dedicated LangChain warnings types for deprecations (mirroring python's built in deprecation and pending deprecation warnings). These deprecation types are unslienced during initialization in langchain achieving the same default behavior that we have with our current warnings approach. However, because these warnings have a dedicated type, users will be able to silence them selectively (I think this is strictly better than our current handling of warnings). The PR adds a deprecation warning to llm symbolic math. --------- Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com> * Make numexpr optional (langchain-ai#11049) Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> * Bump min version of numexpr (langchain-ai#11302) Bump min version * Bedrock scheduled tests (langchain-ai#11194) * Fix closing bracket in length-based selector snippet (langchain-ai#11294) **Description:** Fix a forgotten closing bracket in the length-based selector snippet Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> * Fix line break in docs imports (langchain-ai#11270) It is just a straightforward docs fix. * add LLMBashChain to experimental (langchain-ai#11305) Add LLMBashChain to experimental * Add .configurable_fields() and .configurable_alternatives() to expose fields of a Runnable to be configured at runtime (langchain-ai#11282) * Upgrade `langchain` dependency versions to resolve dependabot alerts. (langchain-ai#11307) * Add scoring chain (langchain-ai#11123)  * Make Google PaLM classes serialisable (langchain-ai#11121) Similarly to Vertex classes, PaLM classes weren't marked as serialisable. Should be working fine with LangSmith. --------- Co-authored-by: Erick Friis <erick@langchain.dev> * Mark Vertex AI classes as serialisable (langchain-ai#10484)  --------- Co-authored-by: Erick Friis <erick@langchain.dev> * Adds Tavily Search API retriever (langchain-ai#11314) @baskaryan @efriis * Update clarifai.mdx --------- Signed-off-by: Naveen Tatikonda <navtat@amazon.com> Co-authored-by: Nuno Campos <nuno@langchain.dev> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com> Co-authored-by: Apurv Agarwal <apoorvagarwal00@gmail.com> Co-authored-by: Nan LI <linanenv@gmail.com> Co-authored-by: Nuno Campos <nuno@boringbits.io> Co-authored-by: Akio Nishimura <akionux@gmail.com> Co-authored-by: mani2348 <itsmanikumar@gmail.com> Co-authored-by: Mani Kumar Adari <maniadar@amazon.com> Co-authored-by: Arthur Telders <72456061+ATelders@users.noreply.github.com> Co-authored-by: Arthur Telders <arthur.telders@roquette.com> Co-authored-by: Naveen Tatikonda <navtat@amazon.com> Co-authored-by: Joseph McElroy <joseph.mcelroy@elastic.co> Co-authored-by: Justin Plock <jplock@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Hugues <me@hugh.sh> Co-authored-by: Noah Stapp <noah@noahstapp.com> Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: Guy Korland <gkorland@gmail.com> Co-authored-by: Piotr Mardziel <piotrm@gmail.com> Co-authored-by: Piyush Jain <piyushjain@duck.com> Co-authored-by: Nicolas <nicolascamara29@gmail.com> Co-authored-by: Michael Kim <59414764+xcellentbird@users.noreply.github.com> Co-authored-by: Michael Landis <michael@momentohq.com> Co-authored-by: Jeff Kayne <43336277+jeffkayne@users.noreply.github.com> Co-authored-by: Kenneth Choe <kenneth.choe@gmail.com> Co-authored-by: Fynn Flügge <fynnfluegge@gmx.de> Co-authored-by: Jacob Lee <jacoblee93@gmail.com> Co-authored-by: Donatas Remeika <dremeika@users.noreply.github.com> Co-authored-by: PaperMoose <rbrandt810@gmail.com> Co-authored-by: Noah Czelusta <83324596+swimninja247@users.noreply.github.com> Co-authored-by: Clément Sicard <33360172+ClementSicard@users.noreply.github.com> Co-authored-by: Dr. Fabien Tarrade <tarrade@users.noreply.github.com> Co-authored-by: jreinjr <jason.w.reinhardt@gmail.com> Co-authored-by: jare0530 <7915+jare0530@users.noreply.ghe.oculus-rep.com> Co-authored-by: James Braza <jamesbraza@gmail.com> Co-authored-by: Cynthia Yang <zixinyang92@gmail.com> Co-authored-by: Jon Saginaw <saginawj@users.noreply.github.com> Co-authored-by: Attila Tőkés <62890262+attila-tokes@users.noreply.github.com> Co-authored-by: Attila Tőkés <atokes@rws.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: Haozhe <17514803+hazzel-cn@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Leonid Ganeline <leo.gan.57@gmail.com> Co-authored-by: Dayuan Jiang <34411969+DayuanJiang@users.noreply.github.com> Co-authored-by: Kazuki Maeda <kzk.maeda0711@gmail.com> Co-authored-by: Yeonji-Lim <57888020+Yeonji-Lim@users.noreply.github.com> Co-authored-by: James Odeyale <jamesodeyale01@gmail.com> Co-authored-by: zhengkai <994171686@qq.com> Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com> Co-authored-by: Oleg Sinavski <2086260+olegsinavski@users.noreply.github.com> Co-authored-by: João Carabetta <joao.carabetta@gmail.com> Co-authored-by: CG80499 <94075036+CG80499@users.noreply.github.com> Co-authored-by: David Duong <david@duong.cz> Co-authored-by: Erick Friis <erick@langchain.dev>

skcoirz reviewed May 7, 2023

View reviewed changes

tests/integration_tests/document_loaders/test_mongodb.py Outdated Show resolved Hide resolved

langchain/document_loaders/mongodb.py Show resolved Hide resolved

langchain/document_loaders/mongodb.py Show resolved Hide resolved

skcoirz reviewed May 9, 2023

View reviewed changes

tests/integration_tests/document_loaders/test_mongodb.py Outdated Show resolved Hide resolved

tests/integration_tests/document_loaders/test_mongodb.py Outdated Show resolved Hide resolved

tests/integration_tests/document_loaders/test_mongodb.py Outdated Show resolved Hide resolved

skcoirz reviewed May 9, 2023

View reviewed changes

hwchase17 changed the base branch from master to harrison/mongo-loader May 15, 2023 01:20

hwchase17 added the needs work PRs that need more work label May 15, 2023

cjcjameson and others added 18 commits May 17, 2023 15:30

fix homepage typo (langchain-ai#4883)

d6e0b9a

# Fix Homepage Typo ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested... not sure

Bold Crumbs (langchain-ai#4876)

1ff7c95

ConversationalChatAgent: Allow customizing TEMPLATE_TOOL_RESPONSE (l…

5c9205d

…angchain-ai#2361) It's currently not possible to change the `TEMPLATE_TOOL_RESPONSE` prompt for ConversationalChatAgent, this PR changes that. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>

Faiss no avx2 (langchain-ai#4895)

df0c33a

Co-authored-by: Ali Mirlou <alimirlou@gmail.com>

Add html parsers (langchain-ai#4874)

0dc304c

# Add bs4 html parser * Some minor refactors * Extract the bs4 html parsing code from the bs html loader * Move some tests from integration tests to unit tests

Cadlabs/python tool sanitization (langchain-ai#4754)

e28bdf4

Co-authored-by: BenSchZA <BenSchZA@users.noreply.github.com>

Zep memory (langchain-ai#4898)

8966f61

Co-authored-by: Daniel Chalef <daniel.chalef@private.org> Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>

Update gallery (langchain-ai#4873)

a4ac006

Harrison/serper api bug (langchain-ai#4902)

9e2227b

Co-authored-by: Jerry Luan <xmaswillyou@gmail.com>

Harrison/faiss norm (langchain-ai#4903)

ba023d5

Co-authored-by: Jiaxin Shan <seedjeffwan@gmail.com>

Harrison/improved retry tool (langchain-ai#4842)

9165267

Harrison/unified objectives (langchain-ai#4905)

b8d4893

Co-authored-by: Matthias Samwald <samwald@gmx.at>

hwchase17 and others added 12 commits May 30, 2023 16:24

add simple test for imports (langchain-ai#5461)

eab4b4c

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>

Allow for async use of SelfAskWithSearchChain (langchain-ai#5394)

0a44bfd

# Allow for async use of SelfAskWithSearchChain Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>

SQLite-backed Entity Memory (langchain-ai#5129)

ce8b7a2

# SQLite-backed Entity Memory Following the initiative of langchain-ai#2397 I think it would be helpful to be able to persist Entity Memory on disk by default Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>

py tracer fixes (langchain-ai#5377)

1671c2a

Harrison/html splitter (langchain-ai#5468)

f72bb96

Co-authored-by: David Revillas <26328973+r3v1@users.noreply.github.com>

Merge branch 'master' into mongo-document-loader

b39c069

baskaryan assigned eyurtsev and rlancemartin Aug 1, 2023

Merge branch 'harrison/mongo-loader' into mongo-document-loader

272c63c

vercel bot had a problem deploying to Preview September 13, 2023 10:49 Failure

leo-gan requested a review from baskaryan September 13, 2023 15:42

Merge branch 'mongo-document-loader' of https://github.com/saginawj/l…

922e147

…angchain into mongo-document-loader

vercel bot had a problem deploying to Preview September 14, 2023 12:20 Failure

saginawj mentioned this pull request Sep 15, 2023

mongodb doc loader init #10645

Merged

leo-gan closed this Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

async mongo document loader #4285

async mongo document loader #4285

saginawj commented May 7, 2023

saginawj commented May 7, 2023

skcoirz left a comment

skcoirz left a comment

skcoirz left a comment

skcoirz left a comment

skcoirz left a comment

saginawj commented May 10, 2023

skcoirz commented May 10, 2023

saginawj commented May 11, 2023

saginawj commented May 12, 2023

leo-gan commented Sep 13, 2023

vercel bot commented Sep 13, 2023 •

edited

Loading

saginawj commented Sep 13, 2023

leo-gan commented Sep 14, 2023

saginawj commented Sep 15, 2023

leo-gan commented Sep 15, 2023

leo-gan commented Sep 20, 2023

async mongo document loader #4285

async mongo document loader #4285

Conversation

saginawj commented May 7, 2023

saginawj commented May 7, 2023

skcoirz left a comment

Choose a reason for hiding this comment

skcoirz left a comment

Choose a reason for hiding this comment

skcoirz left a comment

Choose a reason for hiding this comment

skcoirz left a comment

Choose a reason for hiding this comment

skcoirz left a comment

Choose a reason for hiding this comment

saginawj commented May 10, 2023

skcoirz commented May 10, 2023

saginawj commented May 11, 2023

saginawj commented May 12, 2023

leo-gan commented Sep 13, 2023

vercel bot commented Sep 13, 2023 • edited Loading

saginawj commented Sep 13, 2023

leo-gan commented Sep 14, 2023

saginawj commented Sep 15, 2023

leo-gan commented Sep 15, 2023

leo-gan commented Sep 20, 2023

vercel bot commented Sep 13, 2023 •

edited

Loading