After detection, personal data is replaced with consistent pseudonyms. The core principle: same person = same pseudonym — across all documents and all processing runs.
Consistency Is Key
If John Smith appears in 50 documents, he becomes Person_A everywhere. If his colleague Jane Brown appears in 30 documents, she becomes Person_B everywhere. This preserves relationships:
- In document 1: "Person_A signed the contract with Company_A"
- In document 2: "Person_A received an email from Person_C"
- In document 3: "The invoice was sent to Address_A of Person_A"
The AI recognises that the same person is involved throughout — without knowing their identity.
Cross-Batch Pseudonyms
Consistency applies not only within a single processing run but also across batches. If John Smith was pseudonymized as Person_A in batch 1, he will also receive the pseudonym Person_A in batches 2, 3 and all subsequent runs. The replacement table is continuously extended.