Modern redaction software automates the bulk of the process. Instead of manually reviewing each document and marking sensitive passages one by one, the software handles detection and removal in several steps:
1. OCR Text Recognition
In the first step, scanned documents and images are converted into machine-readable text using OCR. This is the prerequisite for the software to identify what information a document contains. Without OCR, automated redaction of scans and image files would not be possible.
2. Dictionary Matching
The software compares the recognised text against stored dictionaries. These contain, for example, names of employees, clients or business partners. When a match is found, the software automatically flags the relevant passage for redaction. Dictionaries can be individually maintained and extended — for instance by importing from Active Directory or CSV files.
3. Pattern Recognition (PII Detection)
In addition to dictionary matching, professional redaction software detects personally identifiable information (PII) based on patterns. These include:
- Email addresses — identified by the @ symbol and domain structure
- Phone numbers — national and international formats
- IBAN numbers — country-specific check digits
- Postcodes and addresses — context-based recognition
- Social security numbers — format-dependent by country
- Tax identification numbers — VAT ID, tax number, TIN
- Dates — various formats (DD.MM.YYYY, MM/DD/YYYY, etc.)
4. Manual Review and Editing
After automated detection, the results can be reviewed in an integrated viewer. Individual redactions can be confirmed, removed or manually added. This step is particularly important for legally sensitive documents to ensure that neither too much nor too little has been redacted.
5. Secure Export
In the final step, the document is exported. It is crucial that the redaction is irreversible: the original data must no longer be extractable from the file — neither via copy-and-paste, nor through the metadata or file structure. Professional software exports to formats such as PDF/A, which ensure long-term archiving and legal compliance.