-
Notifications
You must be signed in to change notification settings - Fork 234
feat(v2): add from and to pandas df for documentarray #1161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Co-authored-by: Johannes Messner <[email protected]> Signed-off-by: Charlotte Gerhaher <[email protected]>
Co-authored-by: Johannes Messner <[email protected]> Signed-off-by: Charlotte Gerhaher <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
…to-pandas Signed-off-by: anna-charlotte <[email protected]>
pyproject.toml
Outdated
| rich = ">=13.1.0" | ||
| lz4 = {version= ">=1.0.0", optional = true} | ||
| pydub = {version = "^0.25.1", optional = true } | ||
| pandas = ">=1.1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be an optional dependecy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add it to common or a as a separate extra pandas ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate. we will reorginaze at some point anyway
samsja
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look goods, I added some comments
docarray/array/array/io.py
Outdated
| ) | ||
|
|
||
| @staticmethod | ||
| def access_path_dict_to_nested_dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't you already implement something like this in your previous PR? or am i mixing things up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the function _access_path_to_dict that is used within _access_path_dict_to_nested_dict that I implemented in the last PR.
_access_path_to_dict: transforms one access path to nested dict_ access_path_dict_to_nested_dict: transforms dict with (multiple) access path keys to a joint nested dict by calling the former func
I will move the latter to that other helper file, to keep those functions together yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok sounds good, yes keeping them in the same helper file would be ideal
docarray/array/array/io.py
Outdated
| return json.dumps([doc.json() for doc in self]) | ||
|
|
||
| @classmethod | ||
| def _check_for_valid_document_type(cls) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer returning a bool and rasing at the call site. That way it is clearer where the error actually occurs, and this method can be re-used in contexts where only the information is needed, without wanting to raise an exception
docarray/array/array/io.py
Outdated
| ) | ||
|
|
||
| @classmethod | ||
| def _check_for_valid_access_paths(cls, field_names: Optional[List[str]]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
docarray/array/array/io.py
Outdated
|
|
||
| @classmethod | ||
| def _check_for_valid_document_type(cls) -> None: | ||
| if cls.document_type == AnyDocument: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there are plan to make this check more sophisticated as part of this PR? otherwise this doesn't need to be a method imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, just tried to extract all the duplicate code but maybe I went a bit overboard hehe, moved it back to from_csv and from_pandas again
Signed-off-by: anna-charlotte <[email protected]>
Co-authored-by: Johannes Messner <[email protected]> Signed-off-by: Charlotte Gerhaher <[email protected]>
Co-authored-by: Johannes Messner <[email protected]> Signed-off-by: Charlotte Gerhaher <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
samsja
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should make pandas install optional. Otherwise looks good
Signed-off-by: anna-charlotte <[email protected]>
|
📝 Docs are deployed on https://ft-feat-from-to-pandas--jina-docs.netlify.app 🎉 |
* feat: load from and to csv Signed-off-by: anna-charlotte <[email protected]> * fix: from to csv Signed-off-by: anna-charlotte <[email protected]> * feat: add access path to dict Signed-off-by: anna-charlotte <[email protected]> * fix: from to csv Signed-off-by: anna-charlotte <[email protected]> * fix: clean up Signed-off-by: anna-charlotte <[email protected]> * docs: add docstring and update tmpdir in test Signed-off-by: anna-charlotte <[email protected]> * fix: merge nested dicts Signed-off-by: anna-charlotte <[email protected]> * fix: clean up Signed-off-by: anna-charlotte <[email protected]> * fix: clean up Signed-off-by: anna-charlotte <[email protected]> * test: update test Signed-off-by: anna-charlotte <[email protected]> * fix: apply samis suggestion from code review Signed-off-by: anna-charlotte <[email protected]> * fix: apply suggestions from code review wrt access paths Signed-off-by: anna-charlotte <[email protected]> * fix: apply johannes suggestion Co-authored-by: Johannes Messner <[email protected]> Signed-off-by: Charlotte Gerhaher <[email protected]> * fix: apply johannes suggestion Co-authored-by: Johannes Messner <[email protected]> Signed-off-by: Charlotte Gerhaher <[email protected]> * fix: apply suggestions from code review Signed-off-by: anna-charlotte <[email protected]> * fix: apply suggestions from code review Signed-off-by: anna-charlotte <[email protected]> * fix: typos Signed-off-by: anna-charlotte <[email protected]> * refactor: move helper functions to helper file Signed-off-by: anna-charlotte <[email protected]> * test: fix fixture Signed-off-by: anna-charlotte <[email protected]> * feat: add to and from pandas df for documentarray Signed-off-by: anna-charlotte <[email protected]> * chore: add pandas to pyproject.toml Signed-off-by: anna-charlotte <[email protected]> * docs: update docstring Signed-off-by: anna-charlotte <[email protected]> * fix: mypy Signed-off-by: anna-charlotte <[email protected]> * fix: clean up Signed-off-by: anna-charlotte <[email protected]> * fix: apply suggestions from code review Signed-off-by: anna-charlotte <[email protected]> * fix: apply suggestion from johannes Co-authored-by: Johannes Messner <[email protected]> Signed-off-by: Charlotte Gerhaher <[email protected]> * fix: apply suggestion from johannes Co-authored-by: Johannes Messner <[email protected]> Signed-off-by: Charlotte Gerhaher <[email protected]> * fix: apply suggestions from code review Signed-off-by: anna-charlotte <[email protected]> * fix: apply suggestion Signed-off-by: anna-charlotte <[email protected]> * fix: apply suggestions Signed-off-by: anna-charlotte <[email protected]> --------- Signed-off-by: anna-charlotte <[email protected]> Signed-off-by: Charlotte Gerhaher <[email protected]> Co-authored-by: Johannes Messner <[email protected]> Signed-off-by: Arnav Zutshi <[email protected]>
Add from and to pandas to DocumentArray IOMixin
DocumentArray[Doc].from_pandas(df)da.to_pandas()