Skip to content

Conversation

@anna-charlotte
Copy link
Contributor

@anna-charlotte anna-charlotte commented Feb 22, 2023

Add from and to pandas to DocumentArray IOMixin

  • add DocumentArray[Doc].from_pandas(df)
  • add da.to_pandas()
  • check and update documentation, if required. See guide

anna-charlotte and others added 21 commits February 20, 2023 15:06
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Co-authored-by: Johannes Messner <[email protected]>
Signed-off-by: Charlotte Gerhaher <[email protected]>
Co-authored-by: Johannes Messner <[email protected]>
Signed-off-by: Charlotte Gerhaher <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
anna-charlotte added 3 commits February 22, 2023 11:50
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
@github-actions github-actions bot added size/m and removed size/l labels Feb 22, 2023
@anna-charlotte anna-charlotte marked this pull request as ready for review February 22, 2023 13:38
@anna-charlotte anna-charlotte mentioned this pull request Feb 23, 2023
47 tasks
pyproject.toml Outdated
rich = ">=13.1.0"
lz4 = {version= ">=1.0.0", optional = true}
pydub = {version = "^0.25.1", optional = true }
pandas = ">=1.1.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be an optional dependecy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add it to common or a as a separate extra pandas ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate. we will reorginaze at some point anyway

Copy link
Member

@samsja samsja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look goods, I added some comments

)

@staticmethod
def access_path_dict_to_nested_dict(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't you already implement something like this in your previous PR? or am i mixing things up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the function _access_path_to_dict that is used within _access_path_dict_to_nested_dict that I implemented in the last PR.

  • _access_path_to_dict: transforms one access path to nested dict
  • _ access_path_dict_to_nested_dict: transforms dict with (multiple) access path keys to a joint nested dict by calling the former func

I will move the latter to that other helper file, to keep those functions together yes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sounds good, yes keeping them in the same helper file would be ideal

return json.dumps([doc.json() for doc in self])

@classmethod
def _check_for_valid_document_type(cls) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer returning a bool and rasing at the call site. That way it is clearer where the error actually occurs, and this method can be re-used in contexts where only the information is needed, without wanting to raise an exception

)

@classmethod
def _check_for_valid_access_paths(cls, field_names: Optional[List[str]]) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here


@classmethod
def _check_for_valid_document_type(cls) -> None:
if cls.document_type == AnyDocument:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there are plan to make this check more sophisticated as part of this PR? otherwise this doesn't need to be a method imo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just tried to extract all the duplicate code but maybe I went a bit overboard hehe, moved it back to from_csv and from_pandas again

anna-charlotte and others added 6 commits February 23, 2023 11:39
Co-authored-by: Johannes Messner <[email protected]>
Signed-off-by: Charlotte Gerhaher <[email protected]>
Co-authored-by: Johannes Messner <[email protected]>
Signed-off-by: Charlotte Gerhaher <[email protected]>
Signed-off-by: anna-charlotte <[email protected]>
Copy link
Member

@samsja samsja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make pandas install optional. Otherwise looks good

Signed-off-by: anna-charlotte <[email protected]>
@github-actions
Copy link

📝 Docs are deployed on https://ft-feat-from-to-pandas--jina-docs.netlify.app 🎉

@anna-charlotte anna-charlotte merged commit 4de28ea into feat-rewrite-v2 Feb 23, 2023
@anna-charlotte anna-charlotte deleted the feat-from-to-pandas branch February 23, 2023 12:48
AsRaNi1 pushed a commit to AsRaNi1/docarray that referenced this pull request Feb 28, 2023
* feat: load from and to csv

Signed-off-by: anna-charlotte <[email protected]>

* fix: from to csv

Signed-off-by: anna-charlotte <[email protected]>

* feat: add access path to dict

Signed-off-by: anna-charlotte <[email protected]>

* fix: from to csv

Signed-off-by: anna-charlotte <[email protected]>

* fix: clean up

Signed-off-by: anna-charlotte <[email protected]>

* docs: add docstring and update tmpdir in test

Signed-off-by: anna-charlotte <[email protected]>

* fix: merge nested dicts

Signed-off-by: anna-charlotte <[email protected]>

* fix: clean up

Signed-off-by: anna-charlotte <[email protected]>

* fix: clean up

Signed-off-by: anna-charlotte <[email protected]>

* test: update test

Signed-off-by: anna-charlotte <[email protected]>

* fix: apply samis suggestion from code review

Signed-off-by: anna-charlotte <[email protected]>

* fix: apply suggestions from code review wrt access paths

Signed-off-by: anna-charlotte <[email protected]>

* fix: apply johannes suggestion

Co-authored-by: Johannes Messner <[email protected]>
Signed-off-by: Charlotte Gerhaher <[email protected]>

* fix: apply johannes suggestion

Co-authored-by: Johannes Messner <[email protected]>
Signed-off-by: Charlotte Gerhaher <[email protected]>

* fix: apply suggestions from code review

Signed-off-by: anna-charlotte <[email protected]>

* fix: apply suggestions from code review

Signed-off-by: anna-charlotte <[email protected]>

* fix: typos

Signed-off-by: anna-charlotte <[email protected]>

* refactor: move helper functions to helper file

Signed-off-by: anna-charlotte <[email protected]>

* test: fix fixture

Signed-off-by: anna-charlotte <[email protected]>

* feat: add to and from pandas df for documentarray

Signed-off-by: anna-charlotte <[email protected]>

* chore: add pandas to pyproject.toml

Signed-off-by: anna-charlotte <[email protected]>

* docs: update docstring

Signed-off-by: anna-charlotte <[email protected]>

* fix: mypy

Signed-off-by: anna-charlotte <[email protected]>

* fix: clean up

Signed-off-by: anna-charlotte <[email protected]>

* fix: apply suggestions from code review

Signed-off-by: anna-charlotte <[email protected]>

* fix: apply suggestion from johannes

Co-authored-by: Johannes Messner <[email protected]>
Signed-off-by: Charlotte Gerhaher <[email protected]>

* fix: apply suggestion from johannes

Co-authored-by: Johannes Messner <[email protected]>
Signed-off-by: Charlotte Gerhaher <[email protected]>

* fix: apply suggestions from code review

Signed-off-by: anna-charlotte <[email protected]>

* fix: apply suggestion

Signed-off-by: anna-charlotte <[email protected]>

* fix: apply suggestions

Signed-off-by: anna-charlotte <[email protected]>

---------

Signed-off-by: anna-charlotte <[email protected]>
Signed-off-by: Charlotte Gerhaher <[email protected]>
Co-authored-by: Johannes Messner <[email protected]>
Signed-off-by: Arnav Zutshi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement from_to/csv,pandas v2: handle load from external data sources

4 participants