Advanced

Federated Learning

Federated learning (FL) is a machine learning training paradigm where a model is trained across multiple decentralised devices or servers, each holding their own local data, without that data ever leaving its origin point. Only model updates (gradients or weights) are transmitted to a central aggregation server. This allows organisations to train on sensitive data — medical records, financial transactions, personal device data — that cannot be centralised for legal, privacy, or regulatory reasons.

Federated Learning Architecture

Aggregation Server

Global model

Aggregate gradients

Distribute updated model

Clients (Round N)

Client A Local data

Client B Local data

Client C Local data

Client D Local data

FL training round: clients train locally, send gradients to server, server aggregates and redistributes

A federated learning training round consists of:

Model distribution: The server sends the current global model (or a subset of it) to a selected group of participating clients
Local training: Each client trains the model on its local data for a fixed number of steps, producing updated local weights or gradients
Gradient transmission: Clients send their updates (gradients or model weight differences) to the server — raw training data is never transmitted
Aggregation: The server aggregates client updates, typically using FedAvg (weighted average by local dataset size)
Repeat: The updated global model is redistributed for the next round

Horizontal vs Vertical Federated Learning

Horizontal FL (sample-partitioned)

All clients have data with the same features but different samples (individuals). The most common form.

Example: Multiple hospitals each holding patient records with the same clinical features (blood pressure, age, diagnosis) but different patient populations. Train a disease prediction model without sharing patient data.

Vertical FL (feature-partitioned)

Clients hold data about the same individuals but different feature sets. Requires privacy-preserving record linkage.

Example: A bank and a retailer both hold data about the same customers. The bank has financial behaviour; the retailer has purchase behaviour. Vertical FL trains on both feature sets without either party seeing the other's data.

Communication Efficiency

In large-scale FL (millions of mobile devices), transmitting full model gradients each round is infeasible. Key optimisation techniques:

Gradient compression: Transmit only the top-k gradient values by magnitude (sparsification); use quantisation to reduce precision (e.g., 32-bit → 8-bit)
Client selection: Only a subset of clients participates each round — selected based on connectivity, battery, and data quality — reducing total communication
Local update aggregation: Clients run multiple gradient steps locally between rounds (FedAvg), reducing the number of rounds needed
Asynchronous FL: Server aggregates whenever updates arrive rather than waiting for all selected clients — handles slow or disconnected clients

Security Risks in Federated Learning

Gradient inversion attacks

A malicious server or observer can reconstruct training data from transmitted gradients. Research (Zhu et al., 2019 — "DLG") showed that individual training images can be recovered with high fidelity from gradients. Mitigation: combine FL with differential privacy (add noise to gradients before transmission).

Model poisoning

A malicious client submits crafted gradient updates that introduce a backdoor into the global model (e.g., cause the model to misclassify any input with a trigger pattern). The server cannot verify whether gradients are honest. Mitigation: robust aggregation (Krum, Trimmed Mean, FLTrust).

Free-rider and Byzantine clients

Clients that submit random or stale gradients to reduce their local computation load. Byzantine clients may act maliciously or fail unpredictably. Mitigation: gradient similarity checking; reputation systems for recurring participants.

Inference attacks on the model

The trained model itself may memorise and leak information about training data — membership inference attacks can determine whether a specific record was in a client's training set. FL reduces this risk vs centralised training but does not eliminate it.

Real-World Deployments

Deployment	Use case	Why FL
Google Gboard	Next-word prediction on Android keyboards	Typing data is sensitive personal communication; cannot centralise; 500M+ device scale
Apple (Siri, QuickType, Health)	On-device personalisation, health trend detection	Privacy commitment; data stays on device under iOS privacy model
Healthcare consortia (NVIDIA FLARE)	Medical imaging, drug discovery, rare disease models	HIPAA / GDPR prevents patient data sharing across hospital boundaries
Financial fraud detection	Cross-bank fraud pattern detection	Banks cannot share transaction data due to confidentiality; FL enables shared model without data sharing

Limitations of Federated Learning

Statistical heterogeneity (non-IID data): client data distributions differ significantly, making FedAvg unstable and degrading global model quality vs centralised training
Communication overhead remains significant even with compression — not suitable for very large models without substantial engineering
Debugging and monitoring is harder: cannot inspect training data or diagnose failures at the client level
Does not guarantee privacy by itself — must be combined with differential privacy or secure aggregation for strong privacy guarantees
Compliance complexity: jurisdiction-specific rules about data governance may still apply to model updates and outputs

Checklist: Do You Understand This?

Describe the five steps of a federated learning training round.
What is the difference between horizontal and vertical federated learning? Give an example of each.
What is FedAvg and what are its limitations with non-IID client data?
Explain the gradient inversion attack — what does it enable and how is it mitigated?
Why does federated learning alone not guarantee privacy, and what must be added for a strong privacy guarantee?