Skip to content

Conversation

@JohannesMessner
Copy link
Member

@JohannesMessner JohannesMessner commented Dec 12, 2022

Goals:

Create find() function:

from docarray import DocumentArray, Document
from docarray.utility import find, find_batched
from docarray.typing import TorchTensor

class MyDoc(Document):
    tensor: TorchTensor

da = DocumentArray[MyDoc](MyDoc(tensor=torch.rand(128)) for _ in range(10))

matches, scores = find(da, MyDoc(tensor=torch.rand(128)), embedding_field='tensor', metric='cosine_sim')

batched_query = DocumentArray[MyDoc](
    [MyDoc(tensor=torch.rand(128)) for _ in range(3)]
)
results = find_batched(da, batched_query, embedding_field='tensor', metric='cosine_sim')
assert len(results) == 3
for matches_i, socres_i in results:
    ...

TODO

  • implement find
    • for torch
    • for numpy
    • allow docarray as query
  • refactor: Create classes for backend operations. Edit: Separate PR
  • refactor: let types define the backend they belong to. Edit: Separate PR
  • consider having find and find_batched explicitly
  • user defined callable as distance function. Edit: not doing in first iteration
  • tests
  • Documentation (docstrings)
  • Optional: batching Edit: not in this PR
  • Optional: nested find.Edit: not doing in first iteration
  • Optional: other features from current docarray. Edit: not doing in first iteration

Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
…te-v2

# Conflicts:
#	docarray/typing/__init__.py
#	docarray/typing/tensor/__init__.py
#	docarray/typing/tensor/abstract_tensor.py
#	docarray/typing/tensor/embedding.py
#	tests/integrations/typing/test_torch_tensor.py
Signed-off-by: Johannes Messner <[email protected]>
@JohannesMessner JohannesMessner self-assigned this Dec 12, 2022
@JohannesMessner JohannesMessner added the DocArray v2 This issue is part of the rewrite; not to be merged into main label Dec 12, 2022
@JohannesMessner JohannesMessner linked an issue Dec 12, 2022 that may be closed by this pull request
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
@github-actions github-actions bot added size/xl and removed size/l labels Dec 13, 2022
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Signed-off-by: Johannes Messner <[email protected]>
Signed-off-by: Johannes Messner <[email protected]>
@JohannesMessner JohannesMessner mentioned this pull request Dec 13, 2022
47 tasks
Signed-off-by: Johannes Messner <[email protected]>
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@JohannesMessner JohannesMessner marked this pull request as ready for review December 14, 2022 09:24
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

limit: int = 10,
device: Optional[str] = None,
descending: Optional[bool] = None,
) -> List[FindResult]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a good return format?

@JohannesMessner
Copy link
Member Author

Note that the file structure is not great atm, this will also change with the PR about computational backends

Signed-off-by: Johannes Messner <[email protected]>
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Signed-off-by: Johannes Messner <[email protected]>
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

📝 Docs are deployed on https://ft-feat-find--jina-docs.netlify.app 🎉

@JohannesMessner JohannesMessner merged commit c7dfb90 into feat-rewrite-v2 Dec 14, 2022
@JohannesMessner JohannesMessner deleted the feat-find branch December 14, 2022 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core area/testing area/typing DocArray v2 This issue is part of the rewrite; not to be merged into main size/xl

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

create a find function that operate on DocumentArray

3 participants