We have implemented the masking data option, because in essence, GDPR permits you from storing data you don’t need. This means that personal or sensitive data should not be stored in i.e. logs or NLP training models. Because it’s very hard to always predict when a user or bot needs to use personal or sensitive data to function, we added the function for automatic recognizing specific patterns and masking this data after use. i.e. if a user types “my social security number is 923891
”, this will end up in the logs as “my social security number is ######
”, without losing the function to use the data one-time to make the bot function.
I.e. you can find some examples of personal or sensitive data sets:
Admin user can add their own RegEx patterns per bot, to make sure the bot can detect the specific data structures for that use case.
in this example is a RegEx to detect a person's name.
^[a-zA-Z]+(([',. -][a-zA-Z ])?[a-zA-Z])$
In addition, we added a checkbox for all questions blocks created in the dialog builder. By enabling this checkbox, you already know you are going to ask for sensitive data, and it should not be stored in the logs.
These methods won’t be a 100% guarantee, as technology can’t detect everything. But using these functions will make sure you build a strong case towards regulators in doing as much as possible regarding GDPR.
In addition, we are now developing that you can also enable this before data the utterances are sent to the selected NLP engines, limiting that personal data goes to Google, Microsoft, etc.