Skip to content

HNSWlib wrapper, very slow due to a simple recomputation bug #1703

@TheSeriousProgrammer

Description

@TheSeriousProgrammer

Initial Checks

  • I have read and followed the docs and still think this is a bug

Description

I have a dataset over 2.5Million records with 768 dimensional vectors being indexed, in this case, wrapper takes a whopping 11 seconds to query top 20

With further digging I saw the sqlite3 commands perfomed where sort of a bottle neck
https://github.com/docarray/docarray/blob/0ea6846783a1450dc92e4ce181b430f02e32df10/docarray/index/backends/hnswlib.py#L297C21-L297C26

The above line calculated the totat number of records in the system, before each and every batched search, this adding a 7 sec delay to the system. Instead the self.num_docs() call can be cached and updated only when add, delete or update record calls are made

Example Code

No response

Python, Pydantic & OS Version

docarray==0.35.0
python==3.10.6

Affected Components

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions