Sourcing Good Data 
10 best practices
Welcome 
Why is data quality 
important? 
Our 10 best practices 
Agenda:
Data Quality Story 
Overbooked 10,000 tickets for event 
Manual spreadsheet error 
- telegraph.co.uk
Your data has reach… 
Where data from a report is used: % of data in spreadsheets that influences CEO 
* Panko and Port, 2012 
Inter-departmental 
69% 
Within 
department 
31% 
42%
Just how much of an issue is data quality? 
1 in 10 organisations rate their data 
quality as “excellent” 
Poor data quality accounts for 
20% of business process costs 
$611bn The cost of poor data quality to US 
companies each year 
* Gartner, TDWI
And we want more… 
2009 – enough data to fill a stack of DVDs 
to the moon and back 
2020 – Grow by 44x 
Less than 1% of available data is 
analysed 
93% of execs believe they are losing 
revenue as a result of not fully leveraging 
the information they collect 
* IDC, Oracle and EMC 
1% 
x44 by 2020
What is data quality? 
HOW 
RELIABLE 
IS YOUR 
DATA? 
TRUSTED 
AND 
CREDIBLE 
Complete 
Accurate 
Available 
Consistent
Why is data quality important? 
“It supports accountability” 
“It gives us accurate and timely 
information to manage our business” 
“It ensures the best use of our resources” 
“It increases our efficiency” 
“It reduces the cost of rework” 
“It can increase customer satisfaction” 
“It ensures we have the best possible 
understanding of our customers and employees” 
“It improves the success rate of enterprise initiatives 
like Business Intelligence…”
Building high quality “supply chains” of data 
MEASURE 
FOR QUALITY 
GET THE 
RIGHT DATA 
BE AGILE
Focus on the outcome 
Analysis Paralysis 
Letting data dictate what is 
“important” 
Limited time and energy 
to focus 
1 
ISSUES
1 Focus on the outcome 
Start with 
the 
outcome… 
…then the 
data. 
Focus on 
what matters 
RECOMMENDATIONS
2 Profile your data 
Data supplier doesn’t know 
your data needs 
The data you source is as 
good as the information 
you provide to the 
supplier… 
ISSUES
2 Profile your data 
Write your data profile 
Structure, Format, Frequency, Age, Delivery Method 
Communicate it to data providers 
Opportunity to identify issues and gaps 
RECOMMENDATIONS
3 Get as close to the source as possible 
When your source data is somebody else’s 
spreadsheet…. 
Human Error Risk 
Availability of data 
Unexpected Changes 
Additional effort and complexity 
ISSUES
3 Get as close to the source as possible 
CAUTION 
Be cautious of 
manual 
spreadsheets 
Skip the 
spreadsheet as a 
source 
PLAN 
Communicate and 
measure for quality 
RECOMMENDATIONS
4 Streamline data sources 
Using multiple sources 
Redundant data 
Increased complexity and quality risk 
ISSUES
4 Streamline data sources 
Identify redundant data 
Focus on the essentials 
Cut out the stuff you don’t need 
RECOMMENDATIONS
5 Set data quality expectations 
Perfectionism  Burnout 
You can’t expect to focus on everything 
ISSUES
5 Set data quality expectations 
Focus on high impact data 
Employ tolerances and ranges for quality and accuracy 
RECOMMENDATIONS 
RELAX 
(a little)
6 Catch data quality issues early 
Early 
$1 
$10 
$100 
If found in the 
middle of the 
journey 
If found at the end 
Late of the journey 
* Total Quality Management 
If found at the 
start of journey 
1-10-100 Rule: 
ISSUES
6 Catch data quality issues early 
Implement quality measures near the start of 
the data supply chain 
Use the “start” as a reference point when 
checking data further down the journey 
RECOMMENDATIONS
7 Actively measure quality 
ISSUES 
Invalid Assumption: 
If the data meets our expectations today, it will 
going forward 
No simple way to identify if data is correct 
What happens when we do find an issue?
7 Actively measure quality 
OK 
GOOD 
NOT GOOD 
Define metrics for your data quality 
Measure for quality on a consistent basis 
Address consistent issues with strategic 
solutions (e.g. data cleansing) 
RECOMMENDATIONS
8 Expect Change. Embrace It. 
We all know change is coming 
Business activity, changes in 
strategies and systems 
So rigid that you need to “reset” 
ISSUES
8 Expect Change. Embrace It. 
Likelihood 
Impact 
L 
H 
L 
H 
Score and rank potential changes 
Focus on high likelihood/impact 
changes 
Have a plan in place for high risk items 
RECOMMENDATIONS
9 Plan for change 
A change occurs, then what? 
Lack of clear policies and rules on who 
needs to do what… 
Knowledge resting in the minds of key 
individuals 
ISSUES
9 Plan for change 
RECOMMENDATIONS 
CAUTION 
In the event 
of a change 
the following 
people will… 
Policies and rules Documentation Tracking Changes
10 Controlled human interaction 
Value of human interaction with data… 
… at the cost of data quality 
Uncontrolled manipulation of data 
ISSUES
10 Controlled human interaction 
Avoid uncontrolled manipulation 
Facilitate controlled and discrete changes 
Make sure it is traceable 
RECOMMENDATIONS
Recap 
1 Focus on the outcome 
2 Profile your data 
3 Get close to the source 
4 Streamline data sources 
5 Set data quality expectations
Recap 
6 Catch data quality issues early 
7 Measure quality 
8 Expect and embrace change 
9 Plan for change 
10 Controlled human interaction
Thank You

How to source good data

  • 1.
    Sourcing Good Data 10 best practices
  • 2.
    Welcome Why isdata quality important? Our 10 best practices Agenda:
  • 3.
    Data Quality Story Overbooked 10,000 tickets for event Manual spreadsheet error - telegraph.co.uk
  • 4.
    Your data hasreach… Where data from a report is used: % of data in spreadsheets that influences CEO * Panko and Port, 2012 Inter-departmental 69% Within department 31% 42%
  • 5.
    Just how muchof an issue is data quality? 1 in 10 organisations rate their data quality as “excellent” Poor data quality accounts for 20% of business process costs $611bn The cost of poor data quality to US companies each year * Gartner, TDWI
  • 6.
    And we wantmore… 2009 – enough data to fill a stack of DVDs to the moon and back 2020 – Grow by 44x Less than 1% of available data is analysed 93% of execs believe they are losing revenue as a result of not fully leveraging the information they collect * IDC, Oracle and EMC 1% x44 by 2020
  • 7.
    What is dataquality? HOW RELIABLE IS YOUR DATA? TRUSTED AND CREDIBLE Complete Accurate Available Consistent
  • 8.
    Why is dataquality important? “It supports accountability” “It gives us accurate and timely information to manage our business” “It ensures the best use of our resources” “It increases our efficiency” “It reduces the cost of rework” “It can increase customer satisfaction” “It ensures we have the best possible understanding of our customers and employees” “It improves the success rate of enterprise initiatives like Business Intelligence…”
  • 9.
    Building high quality“supply chains” of data MEASURE FOR QUALITY GET THE RIGHT DATA BE AGILE
  • 10.
    Focus on theoutcome Analysis Paralysis Letting data dictate what is “important” Limited time and energy to focus 1 ISSUES
  • 11.
    1 Focus onthe outcome Start with the outcome… …then the data. Focus on what matters RECOMMENDATIONS
  • 12.
    2 Profile yourdata Data supplier doesn’t know your data needs The data you source is as good as the information you provide to the supplier… ISSUES
  • 13.
    2 Profile yourdata Write your data profile Structure, Format, Frequency, Age, Delivery Method Communicate it to data providers Opportunity to identify issues and gaps RECOMMENDATIONS
  • 14.
    3 Get asclose to the source as possible When your source data is somebody else’s spreadsheet…. Human Error Risk Availability of data Unexpected Changes Additional effort and complexity ISSUES
  • 15.
    3 Get asclose to the source as possible CAUTION Be cautious of manual spreadsheets Skip the spreadsheet as a source PLAN Communicate and measure for quality RECOMMENDATIONS
  • 16.
    4 Streamline datasources Using multiple sources Redundant data Increased complexity and quality risk ISSUES
  • 17.
    4 Streamline datasources Identify redundant data Focus on the essentials Cut out the stuff you don’t need RECOMMENDATIONS
  • 18.
    5 Set dataquality expectations Perfectionism  Burnout You can’t expect to focus on everything ISSUES
  • 19.
    5 Set dataquality expectations Focus on high impact data Employ tolerances and ranges for quality and accuracy RECOMMENDATIONS RELAX (a little)
  • 20.
    6 Catch dataquality issues early Early $1 $10 $100 If found in the middle of the journey If found at the end Late of the journey * Total Quality Management If found at the start of journey 1-10-100 Rule: ISSUES
  • 21.
    6 Catch dataquality issues early Implement quality measures near the start of the data supply chain Use the “start” as a reference point when checking data further down the journey RECOMMENDATIONS
  • 22.
    7 Actively measurequality ISSUES Invalid Assumption: If the data meets our expectations today, it will going forward No simple way to identify if data is correct What happens when we do find an issue?
  • 23.
    7 Actively measurequality OK GOOD NOT GOOD Define metrics for your data quality Measure for quality on a consistent basis Address consistent issues with strategic solutions (e.g. data cleansing) RECOMMENDATIONS
  • 24.
    8 Expect Change.Embrace It. We all know change is coming Business activity, changes in strategies and systems So rigid that you need to “reset” ISSUES
  • 25.
    8 Expect Change.Embrace It. Likelihood Impact L H L H Score and rank potential changes Focus on high likelihood/impact changes Have a plan in place for high risk items RECOMMENDATIONS
  • 26.
    9 Plan forchange A change occurs, then what? Lack of clear policies and rules on who needs to do what… Knowledge resting in the minds of key individuals ISSUES
  • 27.
    9 Plan forchange RECOMMENDATIONS CAUTION In the event of a change the following people will… Policies and rules Documentation Tracking Changes
  • 28.
    10 Controlled humaninteraction Value of human interaction with data… … at the cost of data quality Uncontrolled manipulation of data ISSUES
  • 29.
    10 Controlled humaninteraction Avoid uncontrolled manipulation Facilitate controlled and discrete changes Make sure it is traceable RECOMMENDATIONS
  • 30.
    Recap 1 Focuson the outcome 2 Profile your data 3 Get close to the source 4 Streamline data sources 5 Set data quality expectations
  • 31.
    Recap 6 Catchdata quality issues early 7 Measure quality 8 Expect and embrace change 9 Plan for change 10 Controlled human interaction
  • 32.