Intermediate

Data Analysis Workflow

Claude can analyse uploaded CSV or spreadsheet data, answer questions about it, generate SQL or Python/pandas code, and build a structured narrative from the findings — all without needing a data analyst or a BI tool. This page covers the workflow for rapid data analysis using Claude.

Preparing Your Data for Upload

Claude performs better with clean, well-described data:

  • Include a header row: Column names should be descriptive ("monthly_revenue_usd" not "col_3")
  • Describe the data in your prompt: "This CSV contains monthly sales data for 2024 and 2025. Each row is one month. Columns: date, product_line, units_sold, revenue_usd, region."
  • Flag any known data quality issues: "The April 2024 row has missing revenue data — it should be excluded from totals."
  • For large datasets: Upload a representative sample first to test your analysis approach, then scale up. Claude has context window limits — very large CSVs may need chunking.

Prompting for Descriptive Analysis

Start with descriptive questions to understand the data before asking for interpretation:

  • "Summarise this dataset: row count, date range, and the range of values for revenue_usd."
  • "What are the top 5 products by total units sold?"
  • "Show me the month-over-month revenue change for each region."
  • "Are there any rows with missing or unusual values I should know about?"
  • "What is the average order value by product line?"

These establish a baseline understanding before moving to trend analysis or comparative questions.

Claude can identify patterns across the data when prompted explicitly:

  • Trends: "Is there a clear trend in revenue over this period? Identify the three months with the strongest growth and the three with the steepest declines."
  • Outliers: "Flag any data points that seem unusually high or low compared to the surrounding data. What might explain them?"
  • Comparisons: "Compare H1 2024 vs H1 2025 across all metrics. Which regions improved the most?"
  • Correlations: "Is there any relationship between units_sold and revenue_usd that suggests different pricing across products?"

Generating SQL or Pandas Code

For repeatable analysis or when you need to run analysis on a full dataset that is too large to upload, ask Claude to generate the code instead of doing the analysis directly:

  • SQL: "Write a SQL query to find the top 10 products by revenue in Q4 2024, broken down by region. Assume a PostgreSQL database with this schema: [paste schema]."
  • Pandas: "Write a pandas script to load this CSV, calculate month-over-month revenue growth, and output a summary table with columns: month, revenue, growth_pct."
  • Excel/Google Sheets formulas: "Give me the SUMIFS formula for Excel to sum revenue for the EMEA region in Q3 2024 only."

Generated code should be tested on your actual data — Claude writes correct code for the described schema but may make assumptions that don't match your actual data shape.

Building a Data Narrative

Once you have the findings, ask Claude to synthesise them into a narrative suitable for an audience:

  • "Based on this analysis, write a 3-paragraph executive summary of 2024 sales performance. Highlight: top finding, biggest risk, and recommended focus area for 2025."
  • "Translate these findings into 5 bullet points for a board presentation. Non-technical audience — no percentages beyond one decimal place."
  • "Draft the 'Key Findings' section of a monthly business review using these numbers."

Limitations to Know

Claude cannot execute code or run calculations directly on data in most contexts:

  • Claude reads the data you upload and analyses it as text — for large datasets, it may miss rows it cannot fit in context
  • Complex statistical analysis (regression, clustering, hypothesis testing) is better done with real tools — Claude can write the code but should not be the execution environment for precision statistics
  • Claude can make arithmetic errors on large aggregations — verify totals independently for high-stakes reports
  • Claude Code with the Python execution tool can actually run code against data — this is more reliable for numeric accuracy than text-mode analysis

Checklist: Do You Understand This?

  • Describe the data and its columns before asking analysis questions — context makes findings more accurate
  • Start with descriptive questions (what's in the data) before moving to interpretive questions (what does it mean)
  • Ask for SQL or pandas code for repeatable analysis or large datasets — more reliable than direct text analysis
  • Use Claude to build the data narrative (executive summary, board bullets, report section) from your verified findings
  • Verify arithmetic totals and check generated code on actual data before using in high-stakes reports

Page built: 01 Jun 2026