Project work for the course: The Complete Pandas Bootcamp 2022: Data Science.
[Rank of each country's success per sport]
This challenge asked me to apply and combine many concepts regarding data aggregation that I learned in the course.
It was aimed at working with, manipulating and aggregating data.
It tested my ability:
- to interpret and understand the underlying data
- to incorporate input from subject matter experts (in this case ficticious sports experts)
- to 'think in data structures'
You're asked to compile the official Summer Olympic Games Medal Tables for all editions from 1896 to 2012.
All you can use is a dataset with raw data containing over 31,000 medals (
summer.csv) and the official medal tables for the 1996 and 1976 editions from Wikipedia. (wik_1996.csv,wik_1976.csv).Use the two official medal tables as a reference to check whether your code produces the correct output!
Your goal is to minimize the divergence between your aggregated medal tables and the official medal tables.
Divergence:
- Let's assume that the official number of gold medals for the United States in the 1996 edition is 44 and your code produces 46.
- This is an absolute divergence of 2.
To do:
- Calculate the total absolute divergence for the 1996 and 1976 editions of the Olympics.
- The desired outcome (divergence score) is 0!
This challenge asked me to apply and combine many concepts regarding exploratory data analysis that I learned in the course.
It was aimed at cleaning data, visualising data and hypothesis testing with several data analyisis libraries (pandas, seaborn, matplotlib, scipy).
It tested my ability:
- to interpret and understand the underlying data
- to clean data
- to form and test hypothesis with data
- to visualise data for ease of interpretation
Now that you've aggregated your datasets, you're asked to perform exploratory data analysis on the official Summer Olympic Games Medal Tables for all editions from 1896 to 2012.
There are three datasets:
- summer.csv
- winter.csv
- dictionary.csv
Your goal is to:
- Import the data
- Clean the data
- Merge tables
- Deal with missing values
- Convert data to correct type
- Analyse the dataset
- Which are the most successful countries?
- Summer vs winter games - does country location matter?
- Men vs women - does country culture matter?
- Countries vs sports - do traditions matter?
- Test hypothesis
- Relationship between total medals and population
- Relationship between total medals and participation
- Relationship between total medals and GDP
© 2022 GitHub, Inc.
Terms
Privacy


