Intermediate

Data Analysis Workflow

Claude can analyse uploaded CSV or spreadsheet data, answer questions about it, generate SQL or Python/pandas code, and build a structured narrative from the findings — all without needing a data analyst or a BI tool. This page covers the workflow for rapid data analysis using Claude.

Preparing Your Data for Upload

Claude performs better with clean, well-described data:

Include a header row: Column names should be descriptive ("monthly_revenue_usd" not "col_3")
Describe the data in your prompt: "This CSV contains monthly sales data for 2024 and 2025. Each row is one month. Columns: date, product_line, units_sold, revenue_usd, region."
Flag any known data quality issues: "The April 2024 row has missing revenue data — it should be excluded from totals."
For large datasets: Upload a representative sample first to test your analysis approach, then scale up. Claude has context window limits — very large CSVs may need chunking.

Prompting for Descriptive Analysis

Start with descriptive questions to understand the data before asking for interpretation:

"Summarise this dataset: row count, date range, and the range of values for revenue_usd."
"What are the top 5 products by total units sold?"
"Show me the month-over-month revenue change for each region."
"Are there any rows with missing or unusual values I should know about?"
"What is the average order value by product line?"

These establish a baseline understanding before moving to trend analysis or comparative questions.

Aggregations, Trends, and Outliers

Claude can identify patterns across the data when prompted explicitly:

Trends: "Is there a clear trend in revenue over this period? Identify the three months with the strongest growth and the three with the steepest declines."
Outliers: "Flag any data points that seem unusually high or low compared to the surrounding data. What might explain them?"
Comparisons: "Compare H1 2024 vs H1 2025 across all metrics. Which regions improved the most?"
Correlations: "Is there any relationship between units_sold and revenue_usd that suggests different pricing across products?"

Generating SQL or Pandas Code

For repeatable analysis or when you need to run analysis on a full dataset that is too large to upload, ask Claude to generate the code instead of doing the analysis directly:

SQL: "Write a SQL query to find the top 10 products by revenue in Q4 2024, broken down by region. Assume a PostgreSQL database with this schema: [paste schema]."
Pandas: "Write a pandas script to load this CSV, calculate month-over-month revenue growth, and output a summary table with columns: month, revenue, growth_pct."
Excel/Google Sheets formulas: "Give me the SUMIFS formula for Excel to sum revenue for the EMEA region in Q3 2024 only."

Generated code should be tested on your actual data — Claude writes correct code for the described schema but may make assumptions that don't match your actual data shape.

Building a Data Narrative

Once you have the findings, ask Claude to synthesise them into a narrative suitable for an audience:

"Based on this analysis, write a 3-paragraph executive summary of 2024 sales performance. Highlight: top finding, biggest risk, and recommended focus area for 2025."
"Translate these findings into 5 bullet points for a board presentation. Non-technical audience — no percentages beyond one decimal place."
"Draft the 'Key Findings' section of a monthly business review using these numbers."

Limitations to Know

Claude cannot execute code or run calculations directly on data in most contexts:

Claude reads the data you upload and analyses it as text — for large datasets, it may miss rows it cannot fit in context
Complex statistical analysis (regression, clustering, hypothesis testing) is better done with real tools — Claude can write the code but should not be the execution environment for precision statistics
Claude can make arithmetic errors on large aggregations — verify totals independently for high-stakes reports
Claude Code with the Python execution tool can actually run code against data — this is more reliable for numeric accuracy than text-mode analysis

Checklist: Do You Understand This?

Describe the data and its columns before asking analysis questions — context makes findings more accurate
Start with descriptive questions (what's in the data) before moving to interpretive questions (what does it mean)
Ask for SQL or pandas code for repeatable analysis or large datasets — more reliable than direct text analysis
Use Claude to build the data narrative (executive summary, board bullets, report section) from your verified findings
Verify arithmetic totals and check generated code on actual data before using in high-stakes reports