REFERENCE
AI Detection Glossary
Essential terminology for understanding AI content detection, natural language processing, and forensic text analysis.
| Term | Definition |
|---|---|
| Perplexity | A measure of how predictable text is to a language model. Lower perplexity suggests more predictable (likely AI-generated) text, while higher perplexity indicates more surprising word choices typical of human writing. |
| Burstiness | The variation in sentence length and complexity within a text. Human writing tends to alternate between short and long sentences naturally, while AI-generated text often maintains more uniform sentence structures. |
| Entropy | A measure of information density and randomness in text. Higher entropy indicates more diverse vocabulary and less predictable patterns, which is characteristic of human writing. |
| Token | The basic unit of text processed by language models. A token can be a word, part of a word, or a punctuation mark. Most AI models process text as sequences of tokens rather than individual characters. |
| Large Language Model (LLM) | A neural network trained on vast amounts of text data to generate human-like text. Examples include GPT-o3, Claude Opus 4.5, Gemini 2.5 Pro, and LLaMA 3. |
| Transformer | The neural network architecture underlying modern language models. Transformers use self-attention mechanisms to process relationships between all words in a text simultaneously. |
| Temperature | A parameter controlling the randomness of AI text generation. Higher temperature produces more creative but less predictable output; lower temperature produces more focused, deterministic text. |
| Stylometry | The statistical analysis of writing style, including vocabulary richness, sentence patterns, and punctuation usage. AI detectors use stylometric features to distinguish between human and machine-generated text. |
| Watermarking | A technique where AI providers embed subtle statistical patterns in generated text to make it identifiable. These patterns are invisible to readers but detectable by specialized algorithms. |
| Zero-Shot Detection | Detecting AI-generated text without training on specific examples from a particular model. This approach relies on universal statistical properties that distinguish AI from human writing. |
| False Positive | When human-written text is incorrectly classified as AI-generated. Minimizing false positives is critical for maintaining trust in AI detection tools, especially in academic contexts. |
| Neural Pattern Recognition | Advanced deep learning techniques used to identify subtle patterns in text that may not be captured by traditional statistical methods. Our detector combines neural analysis with classical metrics for maximum accuracy. |