This directory contains iPython notebooks that use the Python API to perform various statistical analyses on interesting datasets. You can click on each link to see a live colab version.
| Notebook | Description |
|---|---|
analyzing_census_data.ipynb |
A notebook that analyzes the relationship between population size and median age for each State, County, and City in the United States. |
COVID_19_Feature_Exploration_Analysis_with_Data_Commons.ipynb |
A notebook that explores how COVID-19 cases trends differ across different counties, and examines hundreds of variables across dozens of sources to see which variables are potentially correlated with COVID-19 mortality rate. |
analyzing_income_distribution.ipynb |
A notebook that plots the distribution of income using statistics provided by the 2017 American Community Survey. The final result is a histogram charting the number of individuals in income brackets ranging from "0 to 10,000USD" up to "Above 200,000USD". |
analyzing_obesity_prevalence.ipynb |
A notebook that analyzes the relationship between prevalence of obesity in 500 US Cities (as provided by the CDC Wonder dataset) to health and socio-economic indicators such as prevalence of high blood pressure and poverty rate. |
Place Similarity with Data Commons.ipynb |
A notebook that identifies similar places given a place and one or more statistical variables from Data Commons. |
Missing Data Imputation Tutorial.ipynb |
A notebook that analyzes the different types of time series holes and different methods of imputing those holes. |
analyzing_genomic_data.ipynb |
A notebook that analyzes genetic variants within RUNX1 (provided by multiple datasets from UCSC Genome Browser, NCBI/gene, and ClinVar). |
Drug_Discovery_With_Data_Commons.ipynb |
A notebook performing drug discovery by identifying novel applications of previously approved drugs using Biomedical Data Commons. |
protein-charts.ipynb |
A notebook summarizing various protein properties and interactions using graphical visualizations. |
Superfund sites (basic) |
A notebook that illustrates basic access to Superfund sites data in Data Commons. |
Superfund sites (extended) |
A notebook that includes extended analysis using Superfund sites data in Data Commons. |
To maintain up to date versions of these notebooks, developers can save a copy
of the above notebooks to a GitHub repository and PR this repository. Navigate
to File > Save a copy in GitHub...
