-
Notifications
You must be signed in to change notification settings - Fork 234
Closed
Description
Initial Checks
- I have read and followed the docs and still think this is a bug
Description
I have a dataset over 2.5Million records with 768 dimensional vectors being indexed, in this case, wrapper takes a whopping 11 seconds to query top 20
With further digging I saw the sqlite3 commands perfomed where sort of a bottle neck
https://github.com/docarray/docarray/blob/0ea6846783a1450dc92e4ce181b430f02e32df10/docarray/index/backends/hnswlib.py#L297C21-L297C26
The above line calculated the totat number of records in the system, before each and every batched search, this adding a 7 sec delay to the system. Instead the self.num_docs() call can be cached and updated only when add, delete or update record calls are made
Example Code
No response
Python, Pydantic & OS Version
docarray==0.35.0
python==3.10.6
Affected Components
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Done