-
Notifications
You must be signed in to change notification settings - Fork 234
feat: hnswlib document index #1124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Co-authored-by: Anne Yang <[email protected]> Signed-off-by: Johannes Messner <[email protected]>
|
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
| default_column_config: Dict[Type, Dict[str, Any]] = field( | ||
| default_factory=lambda: { | ||
| np.ndarray: { | ||
| 'dim': 128, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to keep this dim here when class _Column has n_dim?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, let me double check this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok to clarify: The n_dim in the _Column is taken from the type parameter, e.g. NdArray[512], then n_dim will be 512. The dim here is just a parameter that people can pass to Field(). So in the _Column it could be that n_dim is empty while dim isn't, or vice versa, or both are empty, etc.
We cannot combine these automatically, because what is called dim here could have other names for other backends.
I will clarify the guidance on this in the doc, thanks for pointing out!
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
|
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
samsja
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we mark all of the test as "slow" and "docstore" related ?
yep makes sense |
Signed-off-by: Johannes Messner <[email protected]>
|
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
|
📝 Docs are deployed on https://ft-feat-doc-store--jina-docs.netlify.app 🎉 |
| from docarray.doc_index.backends.hnswlib_doc_index import HnswDocumentIndex | ||
| from docarray.typing import NdArray | ||
|
|
||
| pytestmark = [pytest.mark.slow, pytest.mark.doc_index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh did not know you could do this interesting
Goals:
First implementation of a Document Store.
Design doc: https://lightning-scent-57a.notion.site/Document-Stores-v2-design-doc-f11d6fe6ecee43f49ef88e0f1bf80b7f
Usage Example:
ToDo:
num_docs()etc- [ ] nested access syntax: change__to.Note: unchecked boxes above have been moved to separate PR