Detecting Insurance Fraud with Linguistic Analysis: A Technical Guide

The Problem: Scale Beats Skill

A property and casualty insurer processing 50,000 claims a month can't put human eyes on every narrative. SIU teams are small — typically 1–2 investigators per 100,000 policies — and their time is the most expensive resource in the fraud detection pipeline. The math doesn't work.

The result: most claims get waved through. Fraud only surfaces when the dollar amount is large enough to trigger a threshold flag, when a claim file matches a known fraud ring pattern in the system, or when a frustrated adjuster escalates something that felt wrong. None of these methods are systematic. All of them miss the same category of fraud: the well-written, plausible-sounding fraudulent claim that doesn't trip any rule-based trigger.

This is the category that linguistic analysis is built to catch.

$308B Annual fraud cost (US)

10–15% Estimated fraud rate by premium

<20% Claims receiving SIU review

According to the Insurance Information Institute, fraud and claims buildup add an estimated $400–$700 to the average household's annual premium. The cost isn't abstract — it's baked directly into what honest policyholders pay. Every fraudulent claim that slips through is a transfer from legitimate customers to bad actors.

Why Language Matters: The Psycholinguistics of Deception

When people fabricate or exaggerate claims, they produce a different kind of text than when they describe something that actually happened. This isn't intuition — it's one of the most replicated findings in cognitive psychology.

The landmark meta-analysis by DePaulo et al. (2003), reviewing 120 independent studies, found that deceptive accounts are reliably different from truthful ones across multiple measurable dimensions. Newman et al. (2003) specifically examined linguistic markers in text-based deception, finding that dishonest accounts use fewer first-person singular pronouns, less cognitive complexity language, and more negative emotional language — even when the author is trying to sound credible.

"Liars tell less compelling tales than truth tellers. They make less sense, are less engaging, and tell stories that seem less plausible." — DePaulo, B. M., et al. (2003). Cues to deception. Psychological Bulletin, 129(1), 74–118.

In insurance claims specifically, several patterns emerge in fraudulent narratives:

Pronoun distancing: Claimants distance themselves from the events they're describing. "The car was struck" instead of "someone hit my car." Passive constructions reduce linguistic ownership of the incident.
Hedging in timelines: Vague temporal language — "sometime around," "approximately," "I believe it was" — appears more frequently in fabricated accounts of specific events. Actual memory for traumatic or notable events tends to be more precise.
Detail specificity gaps: Fraudulent claims often oscillate between suspicious over-specificity (invented details meant to add credibility) and suspicious under-specificity (genuine gaps where the author hadn't planned the fabrication through). Legitimate accounts have more consistent specificity throughout.
Emotional leakage: Negative emotional language increases in deceptive accounts, even when the author is consciously trying to sound neutral. The cognitive load of maintaining a fabrication produces measurable emotional bleed.
Cognitive complexity: Truthful accounts of complex events tend to contain more qualifications, acknowledgments of uncertainty, and cause-effect reasoning. Fabricated accounts are often simpler — less internally inconsistent, but also less cognitively rich.

None of these signals are individually conclusive. A nervous but honest claimant will use passive voice. A skilled fraudster might produce a detailed narrative. The signal is in the combination — and in aggregate across thousands of claims, the pattern is statistically reliable.

How Candor Works: API Overview

Candor's API accepts claim text and returns a deception score from 0 to 100, along with sub-scores for five signal categories. A high score doesn't mean fraud — it means the claim language warrants a second look. That's the right framing for a prioritization tool.

Signal	What It Measures	High Score Indicator
Pronoun Distancing	First-person vs. passive construction ratio	Author avoiding linguistic ownership
Hedging	Uncertainty qualifiers and vague temporal language	Imprecise recall for events that should be memorable
Detail Specificity	Consistency and density of concrete details	Uneven specificity pattern suggesting fabrication
Cognitive Complexity	Causal reasoning, qualifications, internal consistency	Oversimplified narrative structure
Emotional Leakage	Negative affect language relative to context	Incongruent emotional tone for the claimed situation

A single API call returns all five sub-scores plus the composite score and flagged sentences — the specific phrases that drove the score. Adjusters don't just get a number; they get the reasons why the claim language raised flags.

Example Analysis: A Fictional Claim Narrative

The following is a fictional claim narrative constructed to demonstrate the analysis. Names, policy numbers, and details are entirely invented.

"The incident occurred on or around the evening of the 14th. My vehicle was parked in the lot adjacent to the shopping area. Upon returning, it was observed that damage had been sustained to the rear panel and door. I believe a passing vehicle may have made contact at some point. The damage appeared consistent with a sideswipe. I was not present at the time. The lot is not well-lit and it is possible no one witnessed the event. I would estimate the damage at approximately $4,200 in repairs based on a general assessment."

Pronoun Distancing

Hedging

Detail Specificity

Cognitive Complexity

Emotional Leakage

Overall Deception Score

High Risk 76

Walk through what drove the score. Pronoun distancing at 82: The narrative almost never uses "I" as an agent — "it was observed," "damage had been sustained," "contact may have been made." The author is describing events that ostensibly happened to their own vehicle without owning a single action. Hedging at 78: "On or around," "I believe," "at some point," "it is possible," "I would estimate" — five separate hedges in eight sentences for an event with a specific claimed dollar amount attached. Legitimate claimants who weren't present often note they weren't there clearly, then describe what they found. They don't hedge the damage description. Detail specificity at 71: The narrative is oddly specific about why there were no witnesses (unlit lot) while being vague about everything that could be verified.

A score of 76 doesn't mean this claim is fraud. It means a trained investigator should look at it. That's the right call.

Integration: Getting Claims Into Candor

The API is a standard REST endpoint. A single POST /api/analyze call with the claim narrative in the body returns the full analysis in JSON:

# Analyze a claim narrative
curl -X POST https://getcandor.polsia.app/api/analyze \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "The incident occurred on or around the evening of the 14th..."
  }'

# Response
{
  "score": 76,
  "is_deceptive": true,
  "signals": {
    "pronoun_distancing": 82,
    "hedging": 78,
    "detail_specificity": 71,
    "cognitive_complexity": 55,
    "emotional_leakage": 48
  },
  "flagged_sentences": [
    "Upon returning, it was observed that damage had been sustained...",
    "I believe a passing vehicle may have made contact at some point."
  ]
}

Batch Processing for Backlogs

For processing existing claims backlogs, the API handles concurrent requests at standard rate limits. A typical workflow: export open claim narratives from your claims management system, run them through the API in batches, and write the score back into the claim record alongside the signal breakdown. Flag any claims above your threshold for SIU review queue.

At 500 claims per hour (a conservative throughput estimate), a backlog of 10,000 claims takes 20 hours of unattended processing. You come back to a ranked list.

Setting Your Threshold

The right threshold depends on your SIU capacity. The API score is continuous — 0 to 100 — so you can tune the cutoff to match how many claims your team can realistically review.

Score ≥ 70: High-risk flag. Recommend SIU review before payment on any claim above this threshold.
Score 50–69: Elevated risk. Worth adjuster review; consider requesting additional documentation before closing.
Score < 50: Normal processing. Proceed through standard workflow.

These are starting points. Your fraud team should calibrate thresholds against confirmed fraud cases from your own claims history. Domain-specific calibration improves accuracy materially — see the validation section below.

Full API documentation, authentication, rate limits, and error codes are at /docs.

Validation: What the Numbers Actually Mean

We benchmark Candor's model against the LIAR dataset — 8,041 expert-labeled statements from Politifact — and publish the results live at /validation. Current evaluation: 1,017 samples, F1 = 0.534.

That number needs context. The LIAR dataset is short political speech — average 20 words per statement. Insurance claim narratives are typically 100–500 words. Psycholinguistic signals require text volume to compute reliably; short inputs produce weaker signal. F1 = 0.534 on political microclaims is likely a conservative floor for performance on full claim narratives.

For comparison: unaided human judges detect deception at roughly 54% accuracy — barely above chance — in controlled studies (DePaulo et al., 2003). Candor's model is benchmarking at parity with that baseline on a domain where it isn't specialized. On longer, narrative-rich insurance claims, the signal differentiation is more pronounced.

We're building supplemental evaluation datasets from insurance claims, legal testimony, and marketplace reviews — domains that better reflect production use cases. When those benchmarks are ready, we'll publish them on the same validation page, with the same methodology. No selective reporting.

We're also honest about the limits: Candor is a triage tool, not a fraud verdict. It tells you which claims deserve human attention. The investigator makes the call. Any vendor claiming their linguistic AI can replace human fraud investigation is either lying or hasn't encountered a skilled fraudster.

Prioritize your SIU caseload with Candor

Try the API free on your own claim narratives. See the live benchmark results. Read the integration docs.

Try the API free → See validation results Read the docs

References

Coalition Against Insurance Fraud. (2022). The Impact of Insurance Fraud. Washington, D.C.
DePaulo, B. M., Lindsay, J. J., Malone, B. E., Muhlenbruck, L., Charlton, K., & Cooper, H. (2003). Cues to deception. Psychological Bulletin, 129(1), 74–118.
Newman, M. L., Pennebaker, J. W., Berry, D. S., & Richards, J. M. (2003). Lying words: Predicting deception from linguistic styles. Personality and Social Psychology Bulletin, 29(5), 665–675.
Vrij, A. (2008). Detecting Lies and Deceit: Pitfalls and Opportunities (2nd ed.). Wiley.
Wang, W. Y. (2017). "Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection. Proceedings of ACL 2017, 422–426.
Insurance Information Institute. (2023). Insurance Fraud. iii.org.