Using The “DIG” Framework for Data Analysis

Using The “DIG” Framework for Data Analysis
Step 1: Description

Goal: Understand a new dataset from scratch and surface data quality issues early.

Scenario: You're given a spreadsheet with zero context.

Prompt 1: Dataset Overview

"List all the columns in the attached spreadsheet and show me a sample of data from each column."

Prompt 2: Sanity Check

"Take 5 random samples from each column to confirm the format and type of information."

Prompt 3: Data Quality Check

"Run a data quality check on each column. Look for missing values, unexpected formats, and outliers."

Key Insight:

* Reveals what the data can and cannot be used for.

* Prevents wrong analysis due to broken or incomplete data.

Step 2: Introspection

Goal: Verify that ChatGPT/Gemini/CoPilot truly understands the data and explore meaningful questions.

Prompt 1: Insight Brainstorming

"Suggest 10 interesting questions we could answer with this dataset and explain why each matters."

Prompt 2: Feasibility Check

"For the first 3 questions, specify which columns are required and whether the data is sufficient."

Prompt 3: Gap Analysis

"What questions would people want to ask—but we can't answer due to missing data?"

Advanced Move: Data Merging

* Acquire a second dataset containing Viewership and Production Cost.

Process: Join files via unique ID (IMDB ID).

* Outcome: "Super dataset" for ROI analysis.

* Example: Cost per viewer efficiency.

Step 3: Goal Setting & Visualization

Goal: Shift from technically correct charts to business-relevant insights.

1. The Key Prompt

"My goal is to [insert your goal]. Based on this goal, which aspect of the data should we focus on?"

2. The Visualization Prompt

"Suggest 3 charts to visualize these insights, write the Python code for them, Explain why it, work and provide a headline for each."

Before presenting your analysis, ask:

"What tough questions will someone challenge this analysis with, and how should I address them?"

Post a Comment