🧠 All Things AI
Beginner

Project: Dataset Explainer

You have a spreadsheet — sales numbers, survey responses, website traffic, or a CSV someone emailed you — and you need to understand what is in it. Traditionally this required either coding skills or expensive analytics software. AI has changed that entirely. This project teaches you to upload any dataset and interrogate it in plain English: get summaries, spot trends, generate charts, and extract the answers your data contains. No coding required.

What AI Can Do With Your Data

TaskBefore AIWith AI now
Understand what a dataset containsScroll through hundreds of rows manuallyUpload → "Summarise this dataset" → done in 10 seconds
Find missing or bad dataConditional formatting + manual inspection"Which columns have missing values? What are the outliers?"
Create a chartSelect range → Insert → Chart → format manually"Create a bar chart of sales by region" → instant
Calculate statisticsWrite AVERAGE, STDEV, COUNTIF formulas"What is the average order value by customer segment?"
Spot trendsSort + filter + scan rows"What trends do you see in the monthly data?"
Compare groupsPivot tables (steep learning curve)"Compare performance between Group A and Group B"
Write an analysis summaryWrite it yourself after all of the above"Write a 200-word executive summary of the key findings"

Which Tool to Use

💬

ChatGPT Advanced Data Analysis (Best all-rounder)

The most capable option. Upload a CSV, Excel, PDF, or JSON file and ChatGPT writes and runs Python code behind the scenes to analyse it. You never see the code — just results, charts, and explanations in plain English. Supports files up to ~50 MB.

Access: ChatGPT Plus/Pro ($20/month) — not available on free tier
File types: CSV, Excel (.xls, .xlsx), PDF, JSON, plain text
Upload from: Your computer, Google Drive, or Microsoft OneDrive (2025)
Charts: Bar, line, pie, scatter — downloadable as images or export as CSV
Best for: Complex analysis, multi-step exploration, generating reusable code if you want it
📊

Julius AI (Best for non-technical users)

A browser-based data analyst built specifically for people without data skills. Upload a spreadsheet and ask questions — Julius immediately creates charts, runs calculations, and explains findings in plain language. Designed for speed: most users get their first insight in under 60 seconds.

Free tier: 15 messages/month, notebook access, 2 GB RAM, Google Drive connector
Plus ($~25/month): 250 messages/month, GPT-4o and Claude 3.5 Sonnet as engines
Pro: Unlimited messages, direct database connectors (Snowflake, BigQuery, Postgres), team collaboration
Notebooks feature: Build a reusable analysis workflow once and save it — reruns automatically on new data
Best for: Non-technical users, recurring reports, teams who share analysis
📋

Google Sheets + Gemini (Best if your data lives in Sheets)

If your data is already in Google Sheets, the Gemini side panel lets you ask questions directly without exporting or uploading anything. Generates formulas, creates charts, summarises trends, and — as of October 2025 — can analyse data spanning multiple tables within the same spreadsheet.

Access: Click the "Ask Gemini" spark icon in the top-right of any Google Sheet
Requires: Google Workspace plan with Gemini add-on, or Google One AI Premium ($19.99/month)
Multi-table analysis: October 2025 — can analyse and join data across multiple tables within one tab
Formula generation: Describe the calculation in plain English, Gemini writes the formula
Best for: Data already in Google Sheets, team environments using Google Workspace
📁

Microsoft Excel Copilot (Best if your data lives in Excel)

Built into Excel (desktop and web), Copilot can analyse columns, create pivot tables, generate charts, write formulas, and highlight insights — all from a natural language prompt. Works directly on your open worksheet with no export required.

Access: Microsoft 365 Personal/Family ($6.99/month) includes Copilot in Excel as of 2025
Best for: Data already in Excel, users in corporate Microsoft 365 environments
Limitation: Works on the active worksheet — cannot easily cross reference external files

Tool Chooser

Your situationUse this
I have a CSV and I want the most capable analysisChatGPT Advanced Data Analysis (Plus/Pro)
I want something free or nearly free for occasional useJulius AI free tier (15 messages/month)
My data is already in Google SheetsGoogle Sheets Gemini side panel
My data is in Excel and I use Microsoft 365Excel Copilot
I do this regularly and want to save the workflowJulius AI Notebooks (Plus/Pro)
I need to share analysis with a teamJulius AI Pro (team collaboration) or Google Sheets + Gemini

Analyse a Dataset — Step by Step

This workflow uses ChatGPT Advanced Data Analysis but the same question sequence works in Julius AI or any other tool.

Step 1: Prepare Your File

Clean your column headers: Make sure the first row contains clear column names (not blank, not merged cells). AI reads the header row first — ambiguous names like "Col1" or "Q3" produce poorer analysis than "Monthly Revenue USD" or "Customer Satisfaction Score".
Remove merged cells: Excel merged cells break CSV parsing. Unmerge everything before uploading.
Save as CSV if unsure: In Excel or Sheets, File → Download → CSV. CSV is universally supported by all AI data tools.
File size: Keep files under 50 MB for ChatGPT. For larger datasets, upload a representative sample (e.g. random 10,000 rows) first to explore structure, then work with the full file in Julius AI Pro which supports direct database connections.

Step 2: The First Four Prompts (Always Start Here)

Run these four prompts in sequence for any new dataset. They give you a complete picture before you ask anything specific.

#PromptWhat you get
1Describe this dataset. How many rows and columns? What does each column contain? What data type is each column?Structure overview — you know what you are working with
2Are there any missing values, blanks, or obvious data quality issues? Which columns are affected and how many rows?Data quality check — you know what to trust
3Show me the basic statistics for the numeric columns: min, max, average, and median.Numerical summary — establishes normal ranges
4What are the 3–5 most interesting patterns or insights you can see in this data? Explain each one in plain language.AI-generated insight list — starting point for deeper questions

Step 3: Ask Specific Questions

Once you understand the structure, ask targeted questions. Be specific — name the columns you care about.

Trends over time: "Plot monthly totals for the [Revenue] column. Is there a trend? Are there any seasonal patterns?"
Comparisons: "Compare average [Order Value] across the different [Region] values. Which region performs best and worst?"
Outliers: "Which rows have unusually high or low values in the [Delivery Time] column? Show me the top 10 outliers."
Filtering: "Show me only the rows where [Status] is 'Cancelled'. What do those rows have in common?"
Relationships: "Is there a relationship between [Customer Age] and [Purchase Frequency]? Create a scatter plot."

Step 4: Create Outputs

Charts: Ask for any chart type — bar, line, pie, scatter, histogram. Download the image for use in presentations.
Summary report: "Write a 300-word executive summary of the key findings from this dataset, suitable for a non-technical manager."
Cleaned data: "Remove rows with missing values in the [Email] column and export the cleaned dataset as a CSV."
Formula for Sheets/Excel: "Write a Google Sheets formula that calculates the 30-day rolling average of the [Daily Sales] column."

Copy-Paste Prompt Templates

First look at any dataset

I'm uploading a dataset. Please start by: (1) telling me how many rows and columns it has, (2) listing each column with its data type and a one-sentence description of what it seems to contain, (3) flagging any obvious data quality issues like missing values, mixed formats, or duplicates, and (4) giving me 3 initial observations about what the data appears to show. Do not start any deeper analysis yet — just give me the overview.

Trend analysis with chart

Using the [DATE COLUMN] and [VALUE COLUMN] columns: (1) plot a line chart showing values over time, (2) identify any clear upward or downward trend, (3) flag any spikes or dips that stand out, and (4) tell me if there are any seasonal or cyclical patterns. Explain your observations in plain English — assume I have no statistics background.

Group comparison

Group the data by [CATEGORY COLUMN] and calculate the average [METRIC COLUMN] for each group. Show the results as a bar chart sorted from highest to lowest. Then tell me: which group performs best, which performs worst, and what the difference is between them in plain numbers.

Executive summary output

Based on everything we have analysed from this dataset, write a 250-word executive summary I can share with stakeholders who have not seen the data. Include: (1) what the dataset covers and its time range, (2) the 3 most important findings, (3) any notable problems or risks visible in the data, and (4) one recommended action based on the findings. Use clear, non-technical language.

Data cleaning request

Please clean this dataset by: (1) removing duplicate rows, (2) removing rows where [COLUMN NAME] is blank or null, (3) standardising the [DATE COLUMN] to YYYY-MM-DD format if it is inconsistent, and (4) removing any rows where [VALUE COLUMN] contains negative numbers (flag how many were removed for each step). Then export the cleaned dataset as a CSV file.

What Works Well

Exploratory questions are where AI excels

Open-ended questions like "what stands out in this data?" or "what are the main patterns?" consistently surface things you would miss by scanning manually. AI sees the whole dataset at once and spots correlations across columns humans rarely think to combine.

Naming the columns in your prompts dramatically improves results

Vague prompts like "analyse the sales data" produce generic responses. Prompts like "compare average Order Value across Region values and show a bar chart" produce precise, actionable outputs. Always reference exact column names from your data.

Charts are free and instant

Any chart you would normally spend 10 minutes building manually — selecting ranges, inserting, formatting, adding labels — takes one sentence. Download and drop directly into a slide deck or report.

Iterative conversation beats one long prompt

Start broad, then narrow. "What are the main trends?" → "Tell me more about the Q3 dip" → "Which customer segments drove that dip?" → "Create a chart of segment performance in Q3." Each turn builds on the previous one, and the AI remembers the full context of the uploaded file.

Failure Modes

AI fabricates numbers it cannot compute

If a calculation requires joining data from outside the uploaded file, or if the question is too ambiguous, the AI may produce plausible-looking but wrong numbers. Always spot-check key statistics by calculating one or two manually (use a calculator or a simple SUM formula in Sheets).

Correlations presented as causes

AI will tell you "sales are higher in months when marketing spend is higher" — which is a correlation. It will not volunteer that a third factor (seasonal demand) may explain both. Always ask "what other explanations might exist?" when the AI finds a relationship.

Uploading sensitive data to public AI services

Do not upload data containing personal information (names, emails, medical records, payment data) to consumer AI tools unless you have confirmed the provider's data policy. For sensitive datasets, anonymise the data first (replace names with IDs, remove PII columns) or use an enterprise-tier tool with a data processing agreement.

Poorly labelled columns produce garbage analysis

If your columns are named "A", "B", "Column1", or contain spaces and special characters, the AI will misinterpret or skip them. Rename columns to clear, descriptive names before uploading — this single step makes a large difference to output quality.

Session context is lost when you start a new chat

In ChatGPT, uploading a file to a new conversation means re-uploading and re-prompting from scratch. Keep dataset analysis in a single long conversation thread so the AI retains full context. Julius AI's Notebooks feature partially solves this by saving the analysis workflow.

Data Privacy — What to Know

Before uploading any dataset: Check whether it contains personally identifiable information (PII) — names, email addresses, phone numbers, IP addresses, health data, or financial records. Consumer AI tools (ChatGPT, Julius free/plus tiers) may use uploaded data to improve their models under their standard terms of service. For anything sensitive: (1) anonymise before uploading — replace names with customer IDs, remove email columns; (2) use ChatGPT's "Temporary Chat" mode which is not used for training; or (3) use an enterprise contract with a data processing agreement, which prohibits training on your data.

Extend the Project

Build a recurring report: In Julius AI, once you have built your analysis workflow, save it as a Notebook. Each week, upload the latest data file and run the same notebook — instant updated report with no re-prompting.
Turn findings into a presentation: Copy the executive summary output into a slide tool (Gamma.app, Beautiful.ai, or Google Slides Gemini). Paste your charts as images. Go from raw data to a shareable deck in under 30 minutes.
Connect to live data: Julius AI Pro supports live database connections (Snowflake, BigQuery, Postgres). Instead of downloading a CSV every time, connect directly and your analysis always runs on fresh data.
Add context documents: Upload both your dataset and a related document (a strategy memo, a product brief) to NotebookLM. Ask it to connect the data findings to the strategic context — a technique that produces richer business analysis than data alone.

2025–2026 Developments

Google Sheets Gemini can now analyse multiple tables (Oct 2025)

A significant upgrade in October 2025 allowed Gemini in Sheets to understand and join data across multiple tables within the same spreadsheet tab — previously it could only see one table at a time. This enables cross-referenced analysis that previously required VLOOKUP/INDEX MATCH formulas or pivot table expertise.

Google Sheets Gemini can generate synthetic data (Jun 2025)

Gemini in Sheets gained the ability to generate realistic sample data from a description — useful for building templates, testing formulas, or creating demo datasets without using real customer data. For example: "Generate 100 rows of sample sales data with realistic product names, regions, dates, and revenue figures."

ChatGPT can now pull files from Drive and OneDrive directly (2025)

As of 2025, ChatGPT's file upload dialog supports connecting directly to Google Drive and Microsoft OneDrive — removing the need to download a file, then re-upload it. This makes the data analysis workflow significantly faster for people whose data lives in cloud storage rather than local files.

Checklist: Do You Understand This?

  • Can you name three things AI can do with a dataset that previously required coding or pivot table expertise?
  • Do you know which tool to use if your data is already in Google Sheets? In Excel? If you want the most powerful analysis?
  • Can you describe the four first-look prompts you should always run when analysing a new dataset?
  • Do you know why naming exact column names in your prompts produces better results?
  • Can you explain two risks of uploading data to consumer AI tools and how to mitigate them?
  • Do you know what the October 2025 update to Gemini in Sheets enabled?
  • Can you describe the failure mode where AI presents correlations as causes?