Financial Crime & AML: Detecting Deceptive Language in Suspicious Activity Reports

What keyword screening gets wrong

AML compliance teams work with a limited vocabulary. Transaction monitoring systems trigger on specific patterns — rapid movement through multiple accounts, amounts just below reporting thresholds, unusual geographic routing. FinCEN SARs get filed when those thresholds are hit. But the narrative section of a SAR — the part where the filer explains what they observed — is text written by a human. And humans are capable of obscuring intent inside language that sounds plausible.

Consider what a sophisticated bad actor does before filing a SAR: they run the narrative through the same mental filter they'd apply to passing a polygraph. They choose words carefully. They avoid flagged terms. They construct explanations that sound consistent with each other. Keyword screening won't catch this. The deception isn't in the vocabulary — it's in the structure of the reasoning.

AML Context

SARs filed by bad actors aren't missing incriminating words — they're packed with plausible-sounding, carefully-worded explanations that evade both human reviewers (who are processing hundreds of filings per week) and keyword filters (which only fire on known bad patterns). The problem is narrative-level deception, not lexicon-level violation.

Three linguistic signals that matter in AML contexts

Research in applied psycholinguistics and deception detection has converged on a set of signals that reliably correlate with fabricated or strategically deceptive text in compliance settings. These aren't magic — they're statistical patterns that emerge at scale, and they apply specifically to the kind of deliberate, carefully-constructed language that appears in SAR narratives and CDD interview responses.

1. Pronoun distancing in beneficial ownership declarations

When describing actions they were personally involved in, honest people use first-person singular pronouns (I, my, me) at higher rates than people constructing alibi-like accounts. In beneficial ownership and KYC contexts, legitimate filers writing about their own decisions write close to the material — "I opened the account," "I routed the funds." Deceptive actors unconsciously create distance: "The decision was made to open the account," "Funds were routed through the account." This is the same pronoun-distancing effect documented in forensic psychology literature — it's not unique to AML, but it shows up clearly in suspicious filings.

2. Hedging in transaction explanations

Legitimate transaction explanations are specific: "Wire transfer for Q2 vendor invoice #4892." Deceptive explanations hedge against scrutiny without being obviously evasive: "The payment was processed in accordance with standard business protocols," "Funds were moved as part of routine operational activity." This hedging — modal verbs, passive voice, abstract nouns — is a deliberate attempt to make the explanation sound defensible if reviewed, rather than to accurately describe what happened. High hedging scores in transaction narratives are a reliable signal worth flagging for secondary review.

3. Cognitive complexity drops in fabricated account histories

This one is subtle. Honest account histories — explaining why an account was used in a particular way, what the business relationship looked like, how a transaction fits into a pattern — show natural variation in sentence complexity. Writers move between simple and compound sentences, include tangential context, drop in specifics. Fabricated or strategically edited histories show a different pattern: uniformly low complexity with high word count. The writer is maintaining the structure of a detailed explanation while stripping out the cognitive variation that natural language has. It's the writing equivalent of someone keeping their hands very still so they don't shake.

Detection Note

These signals aren't binary. A SAR with high pronoun distancing isn't automatically fraudulent — it means the filing warrants secondary review. The value of linguistic analysis is prioritization: compliance teams can focus human attention on the filings that carry statistical indicators of manipulation rather than reviewing everything at the same urgency.

How Candor's API works in an AML context

The Candor API accepts any text input and returns a deception risk score (0-100) along with per-signal breakdowns. For AML teams, the integration point is straightforward: text from SAR narratives, CDD interview transcripts, or wire transfer justification fields gets passed to the API on ingestion or during a periodic review batch. Flagged items get routed to a senior reviewer.

// Analyze a SAR narrative for deception signals
fetch('https://getcandor.polsia.app/api/analyze', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: JSON.stringify({
    text: `Customer submitted wire transfer request for $48,500 to an overseas correspondent account. Transaction was initiated as part of routine accounts payable processing. Funds represent settlement of business obligations with international vendor. Customer states the beneficiary relationship was established during a recent trade conference and payment is consistent with standard procurement workflow.`
  })
})
  .then(res => res.json())
  .then(data => console.log(data.score, data.signals));
            

Sample SAR Narrative Analysis

72 / 100 High risk

High hedging (0.84) Pronoun distancing (0.71) Low complexity variance Moderate negation

Hedging score elevated — transaction explanation relies on abstract institutional language ("routine processing," "standard procurement") rather than specific details. Pronoun distancing: actions attributed to passive constructions rather than personal agency. Complexity variance below baseline — narrative is long but uniformly constructed. Recommended for secondary review by BSA officer.

The API response includes both an aggregate score and per-signal scores so investigators can understand why a filing was flagged. This matters for audit trails: a BSA officer needs to be able to explain to a regulator why a particular filing was escalated. Candor's signal-level breakdown provides that documentation.

Signal overview for AML teams

Signal	What it measures	AML relevance	Weight
Pronoun distancing	Ratio of third-person / passive constructions to first-person singular	Beneficial ownership declarations, KYC responses, SAR narratives from filers describing their own behavior	High
Hedging frequency	Rate of modal verbs, passive voice, abstract qualifiers ("may," "could be," "as part of")	Transaction justification fields, wire transfer narratives, SAR explanation sections	High
Cognitive complexity variance	Standard deviation of sentence length and syntactic variety across the document	Fabricated account histories, pattern-of-activity explanations, relationship descriptions	Medium
Emotional leakage	Discrepancy between affective language and neutral-descriptive language ratio	Customer interview transcripts, call center notes, dispute narratives	Medium
Negation density	Rate of negative constructions relative to document length	Denial narratives, counterparty dispute responses, compliance attestations	Low

What this doesn't replace

Linguistic analysis is a prioritization tool, not a determination. It won't tell you whether a SAR is fraudulent — it tells you which SARs to look at first, and gives your BSA officers a structured basis for that selection. The judgment call on filing a SAR still belongs to the compliance officer.

Candor also doesn't replace transaction monitoring systems. Those catch the behavioral patterns — the amounts, velocities, and routing. Linguistic analysis catches the narrative layer: the text that explains those behavioral patterns. Using both together closes the gap that bad actors exploit when they write clean explanations for suspicious transactions.

Try the API free

50 analyses per month on our free tier. No credit card required.
AML integration docs available at the docs page.

Get started — free tier

Also available: Pro ($99/mo, unlimited analyses) and Enterprise (bulk pricing, SLA)