Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 9 additions & 4 deletions datacommons/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,13 @@
# limitations under the License.
""" Data Commons Python Client API Core.

Provides primitive operations for working with collections of nodes. Use cases
include getting all property labels, property values, and triples associated
with collections of nodes specified by their dcids.
Provides primitive operations for working with collections of nodes. For a
collection of nodes identified by their dcids, this submodule implements the
following:

- Getting all property labels
- Getting all property values
- Getting all triples
"""

from __future__ import absolute_import
Expand Down Expand Up @@ -226,7 +230,8 @@ def get_triples(dcids, limit=utils._MAX_LIMIT):

Returns:
A :obj:`dict` mapping dcids to a :obj:`list` of triples `(s, p, o)` where
`s`, `p`, and `o` are instances of :obj:`str`.
`s`, `p`, and `o` are instances of :obj:`str` and either the subject
or object are the mapped dcid.

Raises:
ValueError: If the payload returned by the Data Commons REST API is
Expand Down
3 changes: 2 additions & 1 deletion datacommons/places.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@
""" Data Commons Python Client API Places Module.

Provides convenience functions for working with Places in the Data Commons
knowledge graph. Use cases include getting places contained in a list of places.
knowledge graph. This submodule implements the ability to access :obj:`Place`'s
within a collection of nodes identified by dcid.
"""

from __future__ import absolute_import
Expand Down
26 changes: 8 additions & 18 deletions datacommons/populations.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,12 @@
""" Data Commons Python Client API Populations Module.

Provides convenience functions for accessing :obj:`StatisticalPopulation`'s and
:obj:`Observation`'s in the Data Commons knowledge graph.
:obj:`Observation`'s in the Data Commons knowledge graph. Implements the
following:

- Get :obj:`StatisticalPopulation`'s located at a given collection of nodes.
- Get :obj:`Observation`'s observing a collection of
:obj:`StatisticalPopulation`'s
"""

from __future__ import absolute_import
Expand Down Expand Up @@ -123,22 +128,7 @@ def get_observations(dcids,
observation_date,
observation_period=None,
measurement_method=None):
""" Returns :obj:`Observation`'s dcids observing the given :code:`dcids`.

When the dcids are given as a list, the returned Observations are formatted
as a map from given dcid to Observation dcid. The dcid will *not* be a member
of the dict if a population is there is no available observation for it.

If the dcids field is a Pandas Series, then the return value is a Series where
the i-th cell is the list of values associated with the given property for the
i-th dcid. If no observation is returned, then the cell holds NaN.

When the dcids are given as a Pandas Series, returned Observations are
formatted as a Pandas Series where the i-th entry corresponds to the value
of the observation observing the i-th given dcid. The cells of the Series
contain a single dcid as the combination of measured_property, stats_type,
observation_date, and optional parameters always define a unique Observation
if it exists. If it does not, then the cell will hold NaN.
""" Returns values of :obj:`Observation`'s observing the given :code:`dcids`.

Args:
dcids (Union[:obj:`list` of :obj:`str`, :obj:`pandas.Series`]): Dcids
Expand Down Expand Up @@ -169,7 +159,7 @@ def get_observations(dcids,

When :code:`dcids` is an instance of :obj:`pandas.Series`, the returned
:obj:`Observation`'s are formatted as a :obj:`pandas.Series` where the
`i`-th entry corresponds to observation observing the given dcid asspecified
`i`-th entry corresponds to observation observing the given dcid as specified
by the other parameters *if such exists*. Otherwise, the cell holds NaN.

Examples:
Expand Down
3 changes: 1 addition & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,10 @@ and hosted on readthedocs.org.

To generate documentation locally,

1. Autogenerate the API documentation using `sphinx-apidoc` by running the
1. Autogenerate the API documentation using `sphinx-build` by running the
following command in the root directory of this repository:

```
sphinx-apidoc --separate -f -o docs/source datacommons datacommons/test datacommons/examples
sphinx-build -a docs/source docs/build
```

Expand Down
15 changes: 15 additions & 0 deletions docs/source/_autosummary/datacommons.core.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
datacommons.core
================

.. automodule:: datacommons.core
:members:
:undoc-members:
:show-inheritance:

.. rubric:: Functions

.. autosummary::

get_property_labels
get_property_values
get_triples
13 changes: 13 additions & 0 deletions docs/source/_autosummary/datacommons.places.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
datacommons.places
==================

.. automodule:: datacommons.places
:members:
:undoc-members:
:show-inheritance:

.. rubric:: Functions

.. autosummary::

get_places_in
14 changes: 14 additions & 0 deletions docs/source/_autosummary/datacommons.populations.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
datacommons.populations
=======================

.. automodule:: datacommons.populations
:members:
:undoc-members:
:show-inheritance:

.. rubric:: Functions

.. autosummary::

get_observations
get_populations
13 changes: 13 additions & 0 deletions docs/source/_autosummary/datacommons.query.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
datacommons.query
=================

.. automodule:: datacommons.query
:members:
:undoc-members:
:show-inheritance:

.. rubric:: Classes

.. autosummary::

Query
14 changes: 14 additions & 0 deletions docs/source/_autosummary/datacommons.utils.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
datacommons.utils
=================

.. automodule:: datacommons.utils
:members:
:undoc-members:
:show-inheritance:

.. rubric:: Functions

.. autosummary::

clean_frame
flatten_frame
23 changes: 22 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,11 @@
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['sphinx.ext.napoleon']
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.napoleon'
]

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
Expand All @@ -43,13 +47,30 @@
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['datacommons.examples.rst', 'datacommons.test.rst']

# Autosummary configs
autosummary_generate = True

# Sphinx napoleon configs
napoleon_numpy_docstring = False
napoleon_use_rtype = False


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'alabaster'

# Set various theme options
html_theme_options = {
'github_button': True,
'github_user': 'google',
'github_repo': 'datacommons',
'sidebar_collapse': False,
'sidebar_width': '250px',
'page_width': '1250px',
}

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
Expand Down
144 changes: 144 additions & 0 deletions docs/source/data_model.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
Getting Started with the Data Model
===================================

In this tutorial, we will introduce how Data Commons stores data in its open
knowledge graph.

Important Terms
---------------

The following terms are defined in this tutorial.

+-----------------+-----------------------------------------------------------+
| Term | Description |
+=================+===========================================================+
| Knowledge Graph | The graph structure storing all data in Data Commons. |
+-----------------+-----------------------------------------------------------+
| Node | The representation of an entity in the knowledge graph. |
+-----------------+-----------------------------------------------------------+
| DCID | The unique identifier assigned to each node in the Data |
| | Commons knowledge graph. Short for Data Commons |
| | Identifier. |
+-----------------+-----------------------------------------------------------+
| Type | The class describing a node. For example, the node for |
| | "California" has type "State". |
+-----------------+-----------------------------------------------------------+
| Property | The relation that associates two nodes together. |
+-----------------+-----------------------------------------------------------+
| Property Label | Another name for *property*. |
+-----------------+-----------------------------------------------------------+
| Property Value | The set of nodes adjacent to a given node along edges of |
| | a given *property*. |
+-----------------+-----------------------------------------------------------+
| Triple | A compact representation of a statement in the Data |
| | Commons knowledge graph of the form |
| | |
| | (subject, property, object) |
| | |
+-----------------+-----------------------------------------------------------+

Overview of the Graph
---------------------

Data Commons organizes its data as an open access *knowledge graph*. It contains
statements about real world objects such as

- `"Alameda County" <https://browser.datacommons.org/kg?dcid=geoId/06001>`_
is contained in the State of
`"California" <https://browser.datacommons.org/kg?dcid=geoId/06>`_
- The latitude of
`"Berkeley" <https://browser.datacommons.org/kg?dcid=geoId/0606000>`_, CA
is 37.8703
- The
`population of all persons in Maryland <https://browser.datacommons.org/kg?dcid=dc/o/6w1c9qk7hxjch>`_
has a total count of 5,996,080.

`Entities <https://en.wikipedia.org/wiki/Entity>`_ such as "Alameda County",
"California", and "Berkeley" are represented as **nodes** in the Data Commons
knowledge graph. There are two important details about a node

1. Every node is uniquely identified by a **dcid** which is short for Data
Commons Identifier. The dcid identifying "California" is :code:`geoId/06`
2. Every node has a **class** that broadly describes the category of entities
that it is an instance of. For example, "California" has type
`State <https://browser.datacommons.org/kg?dcid=State>`_.

Relations between entities are represented as a directed edge between two nodes
in the graph. These relations are called **properties** or **property labels**.
A portion of the Data Commons graph capturing the statement

"Alameda County is contained in the California"

can thus be visualized as the following:

.. image:: https://storage.googleapis.com/notebook-resources/image-1.png
:alt: A view of the statement "Alameda County is contained in California"
:align: center

Here "Alameda County" and "California" are nodes while
`"containedInPlace" <https://browser.datacommons.org/kg?dcid=containedInPlace>`_
is a property denoting that the node adjacent to the tail of the edge is
contained in the node adjacent to the head.

Property Values
+++++++++++++++

When accessing nodes in the graph, it is often useful to describe a set of nodes
adjacent to a given node. One may wish to query for all cities that are
contained in a certain county, ask for all schools within a school district,
etc. Given a node and a property, we denote the **property value** as the set
of all nodes that are adjacent to the given node along an edge labeled by the
given property.

For example, the following are a few property values of "Alameda County"
along the property "containedInPlace".

- `California <https://browser.datacommons.org/kg?dcid=geoId/06>`_
- `Berkeley <https://browser.datacommons.org/kg?dcid=geoId/0606000>`_
- `Oakland <https://browser.datacommons.org/kg?dcid=geoId/0653000>`_
- `Emeryville <https://browser.datacommons.org/kg?dcid=geoId/0622594>`_

The graph around Alameda County looks like the following.

.. image:: https://storage.googleapis.com/notebook-resources/readthedocs-image-2.png
:alt: A view of the graph around Alameda County
:align: center

An important thing to note is that direction matters! Berkeley is certainly
contained in Alameda County, but California is *not* contained in Alameda
County. Alameda County is contained in California, but it is *not* contained
in Berkeley!

When asking for property values one may thus wish to distinguish by the
*orientation* or direction of the edge. The property values of "Alameda County"
along *outgoing* edges labeled by "containedInPlace" includes California while
property values along *incoming* edges include Berkeley, Oakland, and
Emeryville.

Triples
+++++++

Relations in the graph can be compactly described in the format of a **triple**.
Triples are 3-tuples that take the form `(subject, property, object)`.

- The *subject* and *object* are two nodes in the Data Commons graph.
- The *property* is the property labeling the edge oriented from subject to
object.

The statement "Alameda County is contained in the California" can be represented
as a triple of the following form

("Alameda County", "containedInPlace", "California")

Indeed, one could represent the entire Data Commons graph as a collection of
triples.

More Information
----------------

Data Commons leverages the `Schema.org <https://schema.org>`_ vocabulary to
provide a common set of types and properties. The Data Model used by Data
Commons also closely resembles the Schema.org data model. One may refer to
documentation on the
`Schema.org data model <https://schema.org/docs/datamodel.html>`_
to learn more.
7 changes: 0 additions & 7 deletions docs/source/datacommons.core.rst

This file was deleted.

7 changes: 0 additions & 7 deletions docs/source/datacommons.places.rst

This file was deleted.

7 changes: 0 additions & 7 deletions docs/source/datacommons.populations.rst

This file was deleted.

7 changes: 0 additions & 7 deletions docs/source/datacommons.query.rst

This file was deleted.

Loading