The Problem with Simple Accuracy Assessment
Standard confusion matrix accuracy is computed by dividing the number of correctly classified pixels by the total number of sample points. This sounds reasonable, but it has a critical flaw: it ignores the proportional area of each class.
If your map has 90% fallow land and 10% wheat, and you randomly sample 100 points, you'll get ~90 fallow samples and ~10 wheat samples. A classifier that labels everything as fallow achieves 90% overall accuracy while being completely useless for the minority class.
Olofsson et al. (2014) — published in Remote Sensing of Environment — provides a statistically rigorous framework for:
- Unbiased accuracy estimation using area-weighted metrics
- Unbiased area estimation with confidence intervals
- Stratified random sampling design
The Framework in 3 Steps
Step 1 — Stratified Sampling Design
Allocate sample points proportional to the mapped area of each class, with a minimum of 50 samples per stratum for reliable estimates:
import numpy as np
def compute_sample_allocation(class_areas_ha, total_samples=500, min_per_class=50):
"""
Proportional allocation with minimum per class.
class_areas_ha: dict {class_name: area_in_ha}
"""
total_area = sum(class_areas_ha.values())
proportions = {k: v / total_area for k, v in class_areas_ha.items()}
# Proportional allocation
allocation = {k: max(int(p * total_samples), min_per_class)
for k, p in proportions.items()}
return allocation, proportions
# Example: Gezira Scheme land cover
class_areas = {
'Wheat': 420_000,
'Other Crops': 180_000,
'Fallow': 1_200_000,
'Water': 20_000,
}
allocation, proportions = compute_sample_allocation(class_areas)
print("Sample allocation:", allocation)
# {'Wheat': 116, 'Other Crops': 50, 'Fallow': 334, 'Water': 50}
Step 2 — Compute the Error Matrix with Area Weights
The key insight is weighting each cell of the confusion matrix by the proportion of the map that each class occupies:
def compute_weighted_error_matrix(confusion_matrix, class_proportions):
"""
Converts a count-based confusion matrix to an area-weighted matrix.
confusion_matrix: numpy array, rows=reference, cols=map
class_proportions: array of mapped area proportions (must sum to 1)
"""
n_i = confusion_matrix.sum(axis=1) # samples per stratum
# Weighted matrix: p_ij = W_i * (n_ij / n_i)
W = np.array(class_proportions)
weighted = np.zeros_like(confusion_matrix, dtype=float)
for i in range(len(W)):
for j in range(len(W)):
weighted[i, j] = W[i] * (confusion_matrix[i, j] / n_i[i])
return weighted
# Example confusion matrix (rows=reference class, cols=mapped class)
# Classes: Wheat, Other Crops, Fallow, Water
cm = np.array([
[108, 4, 3, 1], # reference Wheat
[3, 44, 2, 1], # reference Other Crops
[5, 2, 322, 5], # reference Fallow
[0, 0, 3, 47], # reference Water
])
props = list(proportions.values())
weighted_cm = compute_weighted_error_matrix(cm, props)
Step 3 — Compute Unbiased Metrics
def compute_olofsson_metrics(weighted_cm):
"""
Compute accuracy metrics following Olofsson et al. (2014).
Returns overall accuracy, user's accuracy, producer's accuracy,
and unbiased area proportions with standard errors.
"""
p_ij = weighted_cm
# Overall accuracy
OA = np.trace(p_ij)
# User's accuracy (precision) — row perspective
UA = {j: p_ij[j, j] / p_ij[j, :].sum() for j in range(p_ij.shape[0])}
# Producer's accuracy (recall) — column perspective
PA = {j: p_ij[j, j] / p_ij[:, j].sum() for j in range(p_ij.shape[1])}
# Unbiased area proportion
p_j = p_ij.sum(axis=0)
return {
'overall_accuracy': OA,
'users_accuracy': UA,
'producers_accuracy': PA,
'area_proportions': p_j,
}
results = compute_olofsson_metrics(weighted_cm)
print(f"Overall Accuracy: {results['overall_accuracy']:.3f}")
Standard Errors and Confidence Intervals
The framework also provides variance estimators for each metric. For overall accuracy:
def standard_error_OA(weighted_cm, sample_counts):
"""
Equation 5 from Olofsson et al. (2014).
"""
n_i = np.array([sample_counts[k] for k in sorted(sample_counts)])
W_i = weighted_cm.sum(axis=1)
# q_i = proportion of stratum i that is correctly classified
q_i = np.diag(weighted_cm) / W_i
# Variance of OA
var_OA = np.sum(W_i**2 * (q_i * (1 - q_i)) / (n_i - 1))
return np.sqrt(var_OA)
se = standard_error_OA(weighted_cm, allocation)
OA = results['overall_accuracy']
print(f"OA: {OA:.3f} ± {1.96 * se:.3f} (95% CI)")
My GeoAccuRate Plugin
Implementing this by hand is error-prone, especially the variance calculations. I built GeoAccuRate — a QGIS plugin that automates the entire Olofsson (2014) workflow:
- Load your classified raster and reference sample shapefile
- The plugin computes the weighted confusion matrix automatically
- Outputs: Overall accuracy, User's/Producer's accuracy, area estimates, and standard errors — all in a formatted report
Install it from the QGIS Plugin Repository (search "GeoAccuRate") or from GitHub.
Why This Matters
In the Gezira Scheme wheat mapping project, the simple accuracy was 94.2% but the Olofsson-corrected accuracy was 92.8% — a 1.4 percentage point difference that matters when reporting to FAO. More importantly, the area estimates shifted by up to 8% for minority classes, which directly affects the reported irrigated crop area statistics.
If you're producing land cover maps for policy or resource management decisions, please use this framework. The additional computation is minimal and the statistical validity is essential.