Step 1: Description
Goal: Understand a new dataset from scratch and surface data quality issues early.
Scenario: You're given a spreadsheet with zero context.
Prompt 1: Dataset Overview
"List all the columns in the attached spreadsheet and show me a sample of data from each column."
Prompt 2: Sanity Check
"Take 5 random samples from each column to confirm the format and type of information."
Prompt 3: Data Quality Check
"Run a data quality check on each column. Look for missing values, unexpected formats, and outliers."
Key Insight:
* Reveals what the data can and cannot be used for.
* Prevents wrong analysis due to broken or incomplete data.
Step 2: Introspection
Goal: Verify that ChatGPT/Gemini/CoPilot truly understands the data and explore meaningful questions.
Prompt 1: Insight Brainstorming
"Suggest 10 interesting questions we could answer with this dataset and explain why each matters."
Prompt 2: Feasibility Check
"For the first 3 questions, specify which columns are required and whether the data is sufficient."
Prompt 3: Gap Analysis
"What questions would people want to ask—but we can't answer due to missing data?"
Advanced Move: Data Merging
* Acquire a second dataset containing Viewership and Production Cost.
Process: Join files via unique ID (IMDB ID).
* Outcome: "Super dataset" for ROI analysis.
* Example: Cost per viewer efficiency.
Step 3: Goal Setting & Visualization
Goal: Shift from technically correct charts to business-relevant insights.
1. The Key Prompt
"My goal is to [insert your goal]. Based on this goal, which aspect of the data should we focus on?"
2. The Visualization Prompt
"Suggest 3 charts to visualize these insights, write the Python code for them, Explain why it, work and provide a headline for each."
Before presenting your analysis, ask:
"What tough questions will someone challenge this analysis with, and how should I address them?"
Using The “DIG” Framework for Data Analysis
Using The “DIG” Framework for Data Analysis