Project: Dataset Explainer
You have a spreadsheet — sales numbers, survey responses, website traffic, or a CSV someone emailed you — and you need to understand what is in it. Traditionally this required either coding skills or expensive analytics software. AI has changed that entirely. This project teaches you to upload any dataset and interrogate it in plain English: get summaries, spot trends, generate charts, and extract the answers your data contains. No coding required.
What AI Can Do With Your Data
| Task | Before AI | With AI now |
|---|---|---|
| Understand what a dataset contains | Scroll through hundreds of rows manually | Upload → "Summarise this dataset" → done in 10 seconds |
| Find missing or bad data | Conditional formatting + manual inspection | "Which columns have missing values? What are the outliers?" |
| Create a chart | Select range → Insert → Chart → format manually | "Create a bar chart of sales by region" → instant |
| Calculate statistics | Write AVERAGE, STDEV, COUNTIF formulas | "What is the average order value by customer segment?" |
| Spot trends | Sort + filter + scan rows | "What trends do you see in the monthly data?" |
| Compare groups | Pivot tables (steep learning curve) | "Compare performance between Group A and Group B" |
| Write an analysis summary | Write it yourself after all of the above | "Write a 200-word executive summary of the key findings" |
Which Tool to Use
ChatGPT Advanced Data Analysis (Best all-rounder)
The most capable option. Upload a CSV, Excel, PDF, or JSON file and ChatGPT writes and runs Python code behind the scenes to analyse it. You never see the code — just results, charts, and explanations in plain English. Supports files up to ~50 MB.
Julius AI (Best for non-technical users)
A browser-based data analyst built specifically for people without data skills. Upload a spreadsheet and ask questions — Julius immediately creates charts, runs calculations, and explains findings in plain language. Designed for speed: most users get their first insight in under 60 seconds.
Google Sheets + Gemini (Best if your data lives in Sheets)
If your data is already in Google Sheets, the Gemini side panel lets you ask questions directly without exporting or uploading anything. Generates formulas, creates charts, summarises trends, and — as of October 2025 — can analyse data spanning multiple tables within the same spreadsheet.
Microsoft Excel Copilot (Best if your data lives in Excel)
Built into Excel (desktop and web), Copilot can analyse columns, create pivot tables, generate charts, write formulas, and highlight insights — all from a natural language prompt. Works directly on your open worksheet with no export required.
Tool Chooser
| Your situation | Use this |
|---|---|
| I have a CSV and I want the most capable analysis | ChatGPT Advanced Data Analysis (Plus/Pro) |
| I want something free or nearly free for occasional use | Julius AI free tier (15 messages/month) |
| My data is already in Google Sheets | Google Sheets Gemini side panel |
| My data is in Excel and I use Microsoft 365 | Excel Copilot |
| I do this regularly and want to save the workflow | Julius AI Notebooks (Plus/Pro) |
| I need to share analysis with a team | Julius AI Pro (team collaboration) or Google Sheets + Gemini |
Analyse a Dataset — Step by Step
This workflow uses ChatGPT Advanced Data Analysis but the same question sequence works in Julius AI or any other tool.
Step 1: Prepare Your File
Step 2: The First Four Prompts (Always Start Here)
Run these four prompts in sequence for any new dataset. They give you a complete picture before you ask anything specific.
| # | Prompt | What you get |
|---|---|---|
| 1 | Describe this dataset. How many rows and columns? What does each column contain? What data type is each column? | Structure overview — you know what you are working with |
| 2 | Are there any missing values, blanks, or obvious data quality issues? Which columns are affected and how many rows? | Data quality check — you know what to trust |
| 3 | Show me the basic statistics for the numeric columns: min, max, average, and median. | Numerical summary — establishes normal ranges |
| 4 | What are the 3–5 most interesting patterns or insights you can see in this data? Explain each one in plain language. | AI-generated insight list — starting point for deeper questions |
Step 3: Ask Specific Questions
Once you understand the structure, ask targeted questions. Be specific — name the columns you care about.
Step 4: Create Outputs
Copy-Paste Prompt Templates
First look at any dataset
I'm uploading a dataset. Please start by: (1) telling me how many rows and columns it has, (2) listing each column with its data type and a one-sentence description of what it seems to contain, (3) flagging any obvious data quality issues like missing values, mixed formats, or duplicates, and (4) giving me 3 initial observations about what the data appears to show. Do not start any deeper analysis yet — just give me the overview.
Trend analysis with chart
Using the [DATE COLUMN] and [VALUE COLUMN] columns: (1) plot a line chart showing values over time, (2) identify any clear upward or downward trend, (3) flag any spikes or dips that stand out, and (4) tell me if there are any seasonal or cyclical patterns. Explain your observations in plain English — assume I have no statistics background.
Group comparison
Group the data by [CATEGORY COLUMN] and calculate the average [METRIC COLUMN] for each group. Show the results as a bar chart sorted from highest to lowest. Then tell me: which group performs best, which performs worst, and what the difference is between them in plain numbers.
Executive summary output
Based on everything we have analysed from this dataset, write a 250-word executive summary I can share with stakeholders who have not seen the data. Include: (1) what the dataset covers and its time range, (2) the 3 most important findings, (3) any notable problems or risks visible in the data, and (4) one recommended action based on the findings. Use clear, non-technical language.
Data cleaning request
Please clean this dataset by: (1) removing duplicate rows, (2) removing rows where [COLUMN NAME] is blank or null, (3) standardising the [DATE COLUMN] to YYYY-MM-DD format if it is inconsistent, and (4) removing any rows where [VALUE COLUMN] contains negative numbers (flag how many were removed for each step). Then export the cleaned dataset as a CSV file.
What Works Well
Exploratory questions are where AI excels
Open-ended questions like "what stands out in this data?" or "what are the main patterns?" consistently surface things you would miss by scanning manually. AI sees the whole dataset at once and spots correlations across columns humans rarely think to combine.
Naming the columns in your prompts dramatically improves results
Vague prompts like "analyse the sales data" produce generic responses. Prompts like "compare average Order Value across Region values and show a bar chart" produce precise, actionable outputs. Always reference exact column names from your data.
Charts are free and instant
Any chart you would normally spend 10 minutes building manually — selecting ranges, inserting, formatting, adding labels — takes one sentence. Download and drop directly into a slide deck or report.
Iterative conversation beats one long prompt
Start broad, then narrow. "What are the main trends?" → "Tell me more about the Q3 dip" → "Which customer segments drove that dip?" → "Create a chart of segment performance in Q3." Each turn builds on the previous one, and the AI remembers the full context of the uploaded file.
Failure Modes
AI fabricates numbers it cannot compute
If a calculation requires joining data from outside the uploaded file, or if the question is too ambiguous, the AI may produce plausible-looking but wrong numbers. Always spot-check key statistics by calculating one or two manually (use a calculator or a simple SUM formula in Sheets).
Correlations presented as causes
AI will tell you "sales are higher in months when marketing spend is higher" — which is a correlation. It will not volunteer that a third factor (seasonal demand) may explain both. Always ask "what other explanations might exist?" when the AI finds a relationship.
Uploading sensitive data to public AI services
Do not upload data containing personal information (names, emails, medical records, payment data) to consumer AI tools unless you have confirmed the provider's data policy. For sensitive datasets, anonymise the data first (replace names with IDs, remove PII columns) or use an enterprise-tier tool with a data processing agreement.
Poorly labelled columns produce garbage analysis
If your columns are named "A", "B", "Column1", or contain spaces and special characters, the AI will misinterpret or skip them. Rename columns to clear, descriptive names before uploading — this single step makes a large difference to output quality.
Session context is lost when you start a new chat
In ChatGPT, uploading a file to a new conversation means re-uploading and re-prompting from scratch. Keep dataset analysis in a single long conversation thread so the AI retains full context. Julius AI's Notebooks feature partially solves this by saving the analysis workflow.
Data Privacy — What to Know
Before uploading any dataset: Check whether it contains personally identifiable information (PII) — names, email addresses, phone numbers, IP addresses, health data, or financial records. Consumer AI tools (ChatGPT, Julius free/plus tiers) may use uploaded data to improve their models under their standard terms of service. For anything sensitive: (1) anonymise before uploading — replace names with customer IDs, remove email columns; (2) use ChatGPT's "Temporary Chat" mode which is not used for training; or (3) use an enterprise contract with a data processing agreement, which prohibits training on your data.
Extend the Project
2025–2026 Developments
Google Sheets Gemini can now analyse multiple tables (Oct 2025)
A significant upgrade in October 2025 allowed Gemini in Sheets to understand and join data across multiple tables within the same spreadsheet tab — previously it could only see one table at a time. This enables cross-referenced analysis that previously required VLOOKUP/INDEX MATCH formulas or pivot table expertise.
Google Sheets Gemini can generate synthetic data (Jun 2025)
Gemini in Sheets gained the ability to generate realistic sample data from a description — useful for building templates, testing formulas, or creating demo datasets without using real customer data. For example: "Generate 100 rows of sample sales data with realistic product names, regions, dates, and revenue figures."
ChatGPT can now pull files from Drive and OneDrive directly (2025)
As of 2025, ChatGPT's file upload dialog supports connecting directly to Google Drive and Microsoft OneDrive — removing the need to download a file, then re-upload it. This makes the data analysis workflow significantly faster for people whose data lives in cloud storage rather than local files.
Checklist: Do You Understand This?
- Can you name three things AI can do with a dataset that previously required coding or pivot table expertise?
- Do you know which tool to use if your data is already in Google Sheets? In Excel? If you want the most powerful analysis?
- Can you describe the four first-look prompts you should always run when analysing a new dataset?
- Do you know why naming exact column names in your prompts produces better results?
- Can you explain two risks of uploading data to consumer AI tools and how to mitigate them?
- Do you know what the October 2025 update to Gemini in Sheets enabled?
- Can you describe the failure mode where AI presents correlations as causes?