Data Analysis Workflow
Claude can analyse uploaded CSV or spreadsheet data, answer questions about it, generate SQL or Python/pandas code, and build a structured narrative from the findings — all without needing a data analyst or a BI tool. This page covers the workflow for rapid data analysis using Claude.
Preparing Your Data for Upload
Claude performs better with clean, well-described data:
- Include a header row: Column names should be descriptive ("monthly_revenue_usd" not "col_3")
- Describe the data in your prompt: "This CSV contains monthly sales data for 2024 and 2025. Each row is one month. Columns: date, product_line, units_sold, revenue_usd, region."
- Flag any known data quality issues: "The April 2024 row has missing revenue data — it should be excluded from totals."
- For large datasets: Upload a representative sample first to test your analysis approach, then scale up. Claude has context window limits — very large CSVs may need chunking.
Prompting for Descriptive Analysis
Start with descriptive questions to understand the data before asking for interpretation:
- "Summarise this dataset: row count, date range, and the range of values for revenue_usd."
- "What are the top 5 products by total units sold?"
- "Show me the month-over-month revenue change for each region."
- "Are there any rows with missing or unusual values I should know about?"
- "What is the average order value by product line?"
These establish a baseline understanding before moving to trend analysis or comparative questions.
Aggregations, Trends, and Outliers
Claude can identify patterns across the data when prompted explicitly:
- Trends: "Is there a clear trend in revenue over this period? Identify the three months with the strongest growth and the three with the steepest declines."
- Outliers: "Flag any data points that seem unusually high or low compared to the surrounding data. What might explain them?"
- Comparisons: "Compare H1 2024 vs H1 2025 across all metrics. Which regions improved the most?"
- Correlations: "Is there any relationship between units_sold and revenue_usd that suggests different pricing across products?"
Generating SQL or Pandas Code
For repeatable analysis or when you need to run analysis on a full dataset that is too large to upload, ask Claude to generate the code instead of doing the analysis directly:
- SQL: "Write a SQL query to find the top 10 products by revenue in Q4 2024, broken down by region. Assume a PostgreSQL database with this schema: [paste schema]."
- Pandas: "Write a pandas script to load this CSV, calculate month-over-month revenue growth, and output a summary table with columns: month, revenue, growth_pct."
- Excel/Google Sheets formulas: "Give me the SUMIFS formula for Excel to sum revenue for the EMEA region in Q3 2024 only."
Generated code should be tested on your actual data — Claude writes correct code for the described schema but may make assumptions that don't match your actual data shape.
Building a Data Narrative
Once you have the findings, ask Claude to synthesise them into a narrative suitable for an audience:
- "Based on this analysis, write a 3-paragraph executive summary of 2024 sales performance. Highlight: top finding, biggest risk, and recommended focus area for 2025."
- "Translate these findings into 5 bullet points for a board presentation. Non-technical audience — no percentages beyond one decimal place."
- "Draft the 'Key Findings' section of a monthly business review using these numbers."
Limitations to Know
Claude cannot execute code or run calculations directly on data in most contexts:
- Claude reads the data you upload and analyses it as text — for large datasets, it may miss rows it cannot fit in context
- Complex statistical analysis (regression, clustering, hypothesis testing) is better done with real tools — Claude can write the code but should not be the execution environment for precision statistics
- Claude can make arithmetic errors on large aggregations — verify totals independently for high-stakes reports
- Claude Code with the Python execution tool can actually run code against data — this is more reliable for numeric accuracy than text-mode analysis
Checklist: Do You Understand This?
- Describe the data and its columns before asking analysis questions — context makes findings more accurate
- Start with descriptive questions (what's in the data) before moving to interpretive questions (what does it mean)
- Ask for SQL or pandas code for repeatable analysis or large datasets — more reliable than direct text analysis
- Use Claude to build the data narrative (executive summary, board bullets, report section) from your verified findings
- Verify arithmetic totals and check generated code on actual data before using in high-stakes reports