Skip to content

docs: improve "Creating Custom Document Stores" documentation#10581

Open
davidsbatista wants to merge 3 commits intomainfrom
docs/improve-custome-doc-stores-documentation
Open

docs: improve "Creating Custom Document Stores" documentation#10581
davidsbatista wants to merge 3 commits intomainfrom
docs/improve-custome-doc-stores-documentation

Conversation

@davidsbatista
Copy link
Contributor

Proposed Changes:

  • Improving "Creating Custom Document Stores" section - mentioning the new operations (not part of the Protocol) and the importance of adding async methods

Checklist

  • I have read the contributors guidelines and the code of conduct.
  • I have updated the related issue with new insights and changes.
  • I have added unit tests and updated the docstrings.
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I have documented my code.
  • I have added a release note file, following the contributors guidelines.
  • I have run pre-commit hooks and fixed any issue.

@vercel
Copy link

vercel bot commented Feb 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
haystack-docs Ready Ready Preview, Comment Feb 13, 2026 8:18pm

Request Review

@davidsbatista davidsbatista changed the title Docs/improve custome doc stores documentation docs: improve "Creating Custom Document Stores" documentation Feb 12, 2026
@davidsbatista davidsbatista marked this pull request as ready for review February 12, 2026 16:40
@davidsbatista davidsbatista requested a review from a team as a code owner February 12, 2026 16:40
@davidsbatista davidsbatista requested review from anakin87 and removed request for a team February 12, 2026 16:40

Usually, a Document Store comes with additional methods that can provide advanced search functionalities. These methods are not part of the `DocumentStore` protocol and don’t follow any particular convention. We designed it like this to provide maximum flexibility to the Document Store when using any specific features of the underlying database.

Some additional methods that are not part of the `DocumentStore` protocol, but most of the Document Stores in Haystack have them implemented, are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Google.Contractions] Feel free to use 'aren't' instead of 'are not'.

def update_by_filter(filters: dict[str, Any], meta: dict[str, Any], refresh: bool = False) -> int:
def delete_by_filter(filters: dict[str, Any]) -> int:
```
These methods are not part of the Protocol but highly recommended to implement in your custom Document Store, as users often expect them to be available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Google.Contractions] Feel free to use 'aren't' instead of 'are not'.

1. Implement the logic for `count_documents`.
2. In your `test_document_store.py` module, define the test class `TestDocumentStore(CountDocumentsTest)`. Note how we only inherit from the specific testing mix-in `CountDocumentsTest`.
1. Implement the logic for `count_documents`.
2. In your `test_document_store.py` module, define the test class `TestDocumentStore(CountDocumentsTest)`. Note how we only inherit from the specific testing mix-in `CountDocumentsTest`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.We] Try to avoid using first-person plural like 'we'.

6. Keep iterating with the remaining methods.
- Having a notebook where users can try out your Document Store in a full pipeline can really help adoption, and it’s a great source of documentation. Our [haystack-cookbook](https://github.com/deepset-ai/haystack-cookbook) repository has good visibility, and we encourage contributors to create a PR and add their own.

The [tests](https://github.com/deepset-ai/haystack/blob/main/haystack/testing/document_store.py) in `DocumentStoreBaseTests`give you a good idea of the overall expected behavior of a Document Store and the operations it should support, following it is a good way to make sure your implementation is consistent with the rest of the Haystack ecosystem.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Google.Contractions] Feel free to use 'it's' instead of 'it is'.


The [tests](https://github.com/deepset-ai/haystack/blob/main/haystack/testing/document_store.py) in `DocumentStoreBaseTests`give you a good idea of the overall expected behavior of a Document Store and the operations it should support, following it is a good way to make sure your implementation is consistent with the rest of the Haystack ecosystem.

If the technology you are using for your Document Store supports asynchronous operations, we recommend implementing `async` versions of the methods in the `DocumentStore` protocol as well. This will allow users to take advantage of async features in their applications and pipelines, improving performance and scalability.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.We] Try to avoid using first-person plural like 'we'.


The [tests](https://github.com/deepset-ai/haystack/blob/main/haystack/testing/document_store.py) in `DocumentStoreBaseTests`give you a good idea of the overall expected behavior of a Document Store and the operations it should support, following it is a good way to make sure your implementation is consistent with the rest of the Haystack ecosystem.

If the technology you are using for your Document Store supports asynchronous operations, we recommend implementing `async` versions of the methods in the `DocumentStore` protocol as well. This will allow users to take advantage of async features in their applications and pipelines, improving performance and scalability.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Will] Avoid using 'will'.

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments.

Please also apply these changes to docs-website/versioned_docs/version-2.24/concepts/document-store/creating-custom-document-stores.mdx, so that they will be published in the stable docs.

6. Keep iterating with the remaining methods.
- Having a notebook where users can try out your Document Store in a full pipeline can really help adoption, and it’s a great source of documentation. Our [haystack-cookbook](https://github.com/deepset-ai/haystack-cookbook) repository has good visibility, and we encourage contributors to create a PR and add their own.

The [tests](https://github.com/deepset-ai/haystack/blob/main/haystack/testing/document_store.py) in `DocumentStoreBaseTests`give you a good idea of the overall expected behavior of a Document Store and the operations it should support, following it is a good way to make sure your implementation is consistent with the rest of the Haystack ecosystem.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I'd say that DocumentStoreBaseTests is the minimum requirement but using DocumentStoreBaseExtendedTests would be desirable. WDYT?


Usually, a Document Store comes with additional methods that can provide advanced search functionalities. These methods are not part of the `DocumentStore` protocol and don’t follow any particular convention. We designed it like this to provide maximum flexibility to the Document Store when using any specific features of the underlying database.

Some additional methods that are not part of the `DocumentStore` protocol, but are implemented by most Document Stores in Haystack, include:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Google.Contractions] Feel free to use 'aren't' instead of 'are not'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants