Skip to content
This repository has been archived by the owner on Jul 27, 2021. It is now read-only.

feat: add lmdb dbms indexer #24

Merged
merged 9 commits into from
Jun 18, 2021
Merged

feat: add lmdb dbms indexer #24

merged 9 commits into from
Jun 18, 2021

Conversation

cristianmtr
Copy link
Contributor

@cristianmtr cristianmtr commented Jun 17, 2021

part of jina-ai/serve#2560

lmdb (disk-based key-value storage) indexer

more info https://lmdb.readthedocs.io/en/release/

Performance metrics

    # running lmdb benchmark with 10000 docs ...	running lmdb benchmark with 10000 docs takes 5 seconds (5.50s)
    # running lmdb benchmark with 20000 docs ...	running lmdb benchmark with 20000 docs takes 10 seconds (10.86s)
    # running lmdb benchmark with 100000 docs ...	running lmdb benchmark with 100000 docs takes 1 minute and 3 seconds (63.11s)

compared with FileDBMSIndexer (the former BinaryPb):

    # running filedbms benchmark with 10000 docs takes 12 seconds (12.42s)
    # running filedbms benchmark with 20000 docs takes 41 seconds (41.21s)
    # 100k docs never finished (stopped after 13 minutes :-1:  )

Includes:

  • integration tests
  • bumping jina version to latest master on all reqs.txt
  • renaming Indexers

DBMSIndexer --> Indexer
QueryIndexer --> Searcher

jinahub/indexers/dbms/FileDBMSIndexer/__init__.py Outdated Show resolved Hide resolved
jinahub/indexers/dbms/FileDBMSIndexer/__init__.py Outdated Show resolved Hide resolved
jinahub/indexers/dbms/LMDBDBMSIndexer/Dockerfile Outdated Show resolved Hide resolved
jinahub/indexers/dbms/LMDBDBMSIndexer/README.md Outdated Show resolved Hide resolved
jinahub/indexers/dbms/LMDBDBMSIndexer/__init__.py Outdated Show resolved Hide resolved
jinahub/indexers/dbms/LMDBDBMSIndexer/__init__.py Outdated Show resolved Hide resolved
@cristianmtr cristianmtr requested a review from JoanFM June 17, 2021 16:40
jinahub/indexers/dbms/FileDBMSIndexer/__init__.py Outdated Show resolved Hide resolved
jinahub/indexers/dbms/LMDBDBMSIndexer/Dockerfile Outdated Show resolved Hide resolved
Copy link
Member

@maximilianwerk maximilianwerk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice. Anyhow, dump is required and vectors are currently not stored separately.

jinahub/indexers/dbms/LMDBDBMSIndexer/README.md Outdated Show resolved Hide resolved
jinahub/indexers/dbms/LMDBDBMSIndexer/__init__.py Outdated Show resolved Hide resolved
Copy link
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have both unit and integration tests?

t.replace(d.id.encode(), d.SerializeToString())

@requests(on='/delete')
def delete(self, docs: DocumentArray, parameters: Dict = None, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typehint is not accurate, but actually it will never be None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@cristianmtr
Copy link
Contributor Author

Is it possible to have both unit and integration tests?

They are all under tests/ for each Indexer.

Integration tests are separated, since they need to import from across the repo (which you cannot do in the Dockerfile where we run the unit tests). Perhaps the problem here is that they are not just "unit" tests, since they are also tested with the Flow/ with dumping etc.

@JoanFM
Copy link
Member

JoanFM commented Jun 18, 2021

Is it possible to have both unit and integration tests?

They are all under tests/ for each Indexer.

Integration tests are separated, since they need to import from across the repo (which you cannot do in the Dockerfile where we run the unit tests). Perhaps the problem here is that they are not just "unit" tests, since they are also tested with the Flow/ with dumping etc.

Yes, there need to be unit tests without Flow

@cristianmtr
Copy link
Contributor Author

Is it possible to have both unit and integration tests?

They are all under tests/ for each Indexer.
Integration tests are separated, since they need to import from across the repo (which you cannot do in the Dockerfile where we run the unit tests). Perhaps the problem here is that they are not just "unit" tests, since they are also tested with the Flow/ with dumping etc.

Yes, there need to be unit tests without Flow

There are

@cristianmtr cristianmtr requested a review from JoanFM June 18, 2021 15:32
@cristianmtr cristianmtr merged commit ef3731b into main Jun 18, 2021
@cristianmtr cristianmtr deleted the feat-lmdb-dbms branch June 18, 2021 16:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants