Cloud DLP offers several deidentification techniques that can help obscure sensitive information while preserving some utility:
- Masking — Masks a string either fully or partially by replacing a given number of characters with a specified fixed character. This technique can, for example, mask everything but the last four digits of an account number or Social Security number.
- Redaction — Redacts a value by removing it.
- Replacement — Replaces each input value with a given value.
- Pseudonymization with secure hash — Replaces input values with a secure one-way hash generated using a data encryption key.
- Pseudonymization with format-preserving token — Replaces an input value with a “token,” or surrogate value, of the same character set and length using format-preserving encryption (FPE). Preserving the format can help ensure compatibility with legacy systems that have restricted schema or format requirements.
- Generalization bucketing — Masks input values by replacing them with “buckets,” or ranges, within which the input value falls. For example, you can bucket specific ages into age ranges or distinct values into ranges like “low,” “medium,” and “high.”
- Date shifting — Shifts dates by a random number of days per user or entity. This helps obfuscate actual dates while still preserving the sequence and duration of a series of events or transactions.
- Time extraction — Extracts or preserves a portion of Date, Timestamp, and TimeOfDay values.