Language performance metrics

Word Error Rate (WER)

Word Error Rate (WER) is a metric used to evaluate the performance of speech recognition and machine translation systems. It measures the minimum number of word-level operations (insertions, deletions, and substitutions) required to transform the predicted text into the reference text, divided by the total number of words in the reference.

The WER score ranges from 0 to infinity, where 0 indicates a perfect match.

Using jiwer Library

The jiwer library provides a robust implementation of WER and related metrics:

import jiwer

# Simple WER calculation
reference = "the cat sat on the mat"
hypothesis = "the cat sat mat"

# Calculate WER
wer = jiwer.wer(reference, hypothesis)
print(f"WER: {wer:.2f}")  # Output: WER: 0.33

# Calculate other metrics
mer = jiwer.mer(reference, hypothesis)  # Match Error Rate
wil = jiwer.wil(reference, hypothesis)  # Word Information Lost

print(f"MER: {mer:.2f}")
print(f"WIL: {wil:.2f}")
WER: 0.33
MER: 0.33
WIL: 0.33