Guides: AI Literacy and Pedagogy in the Age of Generative AI: Evaluate: Perplexity and Burstiness

Evaluating AI Outputs and Tools

Please see our other guide on evaluating tools and their outputs.

Perplexity and Burstiness

Other Considerations

1. Perplexity

What is it? Perplexity is a measure used to evaluate how well a probability distribution predicts a sample. In the context of generative AI, it quantifies how "surprised" the model is by a given input, based on the data it has been trained on. A lower perplexity indicates that the model is less surprised and thus better at predicting the input.

How does it relate to AI-generated content? If an AI language model produces a piece of text that seems improbable or unexpected based on its training, the perplexity would be high. For instance, a coherent and grammatically correct text would typically have lower perplexity than a jumbled, nonsensical one.

Why is it important? Imagine you're reading a book and trying to guess the next word in a sentence. If the language and context are familiar, you can often make accurate predictions. Similarly, a language model trained on vast amounts of data uses perplexity to assess how accurately it can predict or understand the next word or piece of data

For researchers, understanding perplexity helps in:

Evaluating the quality of AI-generated outputs.
Comparing the performance of different models.
Assessing how well the model understands a given dataset or subject matter.
Deciphering AI-generated content by identifying contextual oddities

Considerations for Researchers:

Training Data: If a model is trained on specific genres or disciplines, it might show low perplexity for similar content but high perplexity for unfamiliar subjects.
Overfitting: A model with too low perplexity might be overfitted to its training data, meaning it might not generalize well to new, unseen data.

2. Burstiness

What is it? Burstiness refers to the tendency of certain events or terms to appear in clusters rather than uniformly or randomly distributed. In the context of AI-generated content, it can manifest as repetitive or clustered outputs when you might expect more diverse responses.

Why is it important? Understanding burstiness is essential because it provides insight into:

Quality of AI-generated content: Repetitive or too similar content might indicate the model's limited understanding or inherent biases.
Data patterns: Recognizing burstiness helps in identifying the patterns or biases present in the training data.

For researchers, grasping the concept of burstiness can aid in:

Detecting anomalies or repetitive patterns in AI-generated outputs.

Understanding potential biases in the training data.
Ensuring the diversity and quality of results for research applications.
Deciphering AI-generated content by identifying:
- Repetition & Overemphasis: If a text frequently repeats themes, words, or ideas in a way that seems unnatural or overly emphasized, it might be a sign of AI generation.
- Clustered Information: AI-generated content can sometimes present clustered information, meaning it might delve too deeply into a specific topic or idea, neglecting a more balanced approach.
- Consistent Themes in Varied Inputs: If, upon giving varied prompts or inputs, the AI tends to veer towards certain themes or terms consistently, it can be an indication of its bursty nature and a hint that the content is AI-generated.

Considerations for Researchers:

Training Data: A model trained on data with inherent burstiness or repetition is more likely to produce clustered outputs.
Adjusting Parameters: Sometimes, adjusting model parameters or input prompts can help in mitigating burstiness in outputs.
Interdisciplinary Awareness: Understanding the norms of term usage in various disciplines can help in distinguishing genuine burstiness from discipline-specific patterns.

More information about perplexity and burstiness can be found at the UNLV guide.