How to Read an AI Paper
Most practitioners consume AI research through blog posts, Twitter threads, and YouTube summaries. These are useful for awareness but they strip out the details that matter: what the authors actually did, what the baselines were, what the limitations are, and whether the evaluation is fair. Building the ability to read primary sources is one of the highest-leverage skills for anyone who wants to understand the field deeply rather than just track it.
Why Primary Sources Matter
What you get from papers
- Exact architecture decisions and hyperparameters that blog posts omit
- The authors' own description of limitations — often buried but honest
- The baseline comparisons (or absence of them)
- Implementation details in the appendix that determine whether results replicate
- What the paper does not claim, which is often more informative than what it does
Risks of secondhand sources
- LLMs summarizing papers frequently hallucinate details or misstate claims
- Blog posts amplify the headline result and drop the caveats
- Social media summaries are optimized for engagement, not accuracy
- Benchmark numbers get quoted without context of evaluation protocol
- Impressive charts frequently represent cherry-picked examples
Paper Anatomy
Virtually all ML papers follow the same structural template. Understanding what each section is supposed to do tells you which sections to read in which order for your purpose.
| Section | Purpose | Read priority |
|---|---|---|
| Abstract | High-level claim, method, and result in 150–200 words | Always — first pass starts here |
| Introduction | Problem motivation, gap in prior work, contributions list | Always — reveals what they claim to contribute |
| Related Work | Context-setting; positions paper relative to prior art | Skim first pass; read if you need citations |
| Methods | What they actually built and how — the technical core | Always — this is what matters most |
| Experiments | Evaluation setup, baselines, main results, ablations | Always — do results actually support the claims? |
| Conclusion | Summary and limitations statement | Always first pass — authors state limitations here |
| Appendix | Hyperparameters, ablations, proofs, additional results | Read when reproducing; often contains crucial details |
The Three-Pass Method
Keshav (2007) proposed the three-pass reading method, originally for systems research papers. It maps directly onto ML papers and prevents the most common reading failure — spending four hours on a paper that turns out to be irrelevant.
First Pass — 5–10 minutes
Goal: decide whether the paper is worth your time. Read in order:
- Title and abstract (carefully)
- Section headings only
- Conclusion paragraph
- One glance at each figure/table — what are they measuring?
Exit outcome: you should know the paper's category (new method, benchmark, analysis, survey), the main claim, and whether it's relevant to you.
Second Pass — 45–90 minutes
Goal: understand the key ideas without full technical depth. Focus on:
- Figures and tables in detail — can you explain each figure in your own words?
- The methods section at a conceptual level — what is the core idea?
- The main results table — does the improvement look meaningful?
- The limitations paragraph in the conclusion
- Skip proofs and derivations; note them for the third pass if needed
Exit outcome: you could summarize the paper accurately to someone else.
Third Pass — 4–8 hours (for papers you need to implement)
Goal: full technical understanding, sufficient to implement or build on. Cover:
- Every equation and derivation — verify them against your own understanding
- Appendix hyperparameters and implementation details
- All ablation experiments — which components of the method matter?
- Implicit assumptions you can now challenge: What would happen if they changed X?
- Cross-reference with cited papers to verify the claims about prior work
Exit outcome: you could implement the method from scratch and identify flaws in the experimental design.
Reading Experiments Critically
The experiments section is where most papers are vulnerable. Learning to read it critically is the most valuable skill for distinguishing genuine advances from overfitted or cherry-picked results.
Questions to ask about every experiment
- Is the baseline competitive and fair? (Authors often compare against weak baselines)
- Is the evaluation metric appropriate for the task?
- Is the test set genuinely held out, or could the model have been tuned on it?
- How many seeds were run? Are standard deviations reported?
- Are ablation experiments present — do they confirm which components matter?
- Are the example outputs shown representative or cherry-picked?
Red flags that weaken a paper
- No baselines, or baselines from 3+ years ago
- Results only on a narrow set of benchmarks the method was tuned on
- No ablation study — you cannot tell which parts of the method matter
- Large variance in results with no significance testing
- Qualitative examples only, no quantitative evaluation
- Evaluation set not clearly described or potentially contaminated with training data
- Claims not matched to results: abstract says X, results show X on a subset of conditions
Key Paper Discovery Sources
| Source | Best for | Caveats |
|---|---|---|
| arXiv (arxiv.org) | All ML papers — posted before or simultaneous with peer review; cs.LG, cs.CL, cs.CV are the main sections | Not peer-reviewed; quality varies widely; preprints may change before final version |
| Papers with Code | Links papers to open implementations and benchmark leaderboards; filter by task/dataset | Coverage incomplete; linked code may differ from paper version |
| Semantic Scholar | Citation graph; find papers that cite or are cited by a key paper; track a field's evolution | Less useful for papers from last 6 months (citation lag) |
| HuggingFace Papers | Curated feed of significant recent papers; community upvoting surfaces high-signal work | Skews toward applied and model-release papers; theory underrepresented |
Efficient Note-Taking
A minimal paper note template:
- What they claim: one sentence from the abstract
- What they actually did: method summary in your own words
- Key result: the one number that matters, with its context (dataset, metric, baseline)
- Main limitation: from their own limitations section or your own reading
- Would I trust this? your critical assessment after reading the experiments
- Follow-up: papers cited or citing that you should read next
Checklist: Do You Understand This?
- Why is reading the primary paper more reliable than reading an LLM summary or a blog post about it?
- Describe the three-pass method. What is the goal and time budget for each pass?
- When reading the experiments section, what are three questions you should ask to assess whether the results are credible?
- List three red flags in an ML paper's experimental design that should make you skeptical of the results.
- What is the difference between arXiv and Papers with Code, and when would you use each?
- If you only had 10 minutes to understand a paper you've never seen, what would you read in that order?