Google reinforces data privacy in AI initiatives with a groundbreaking k-anonymity patent, offering enhanced protection for sensitive information.
Google has a lot of AI initiatives planned, so maintaining the cleanliness of its data is probably a top concern. The business submitted a system patent application for “anonymizing large scale datasets.” By protecting the identities of entities within a dataset, Google’s approach seeks to offer “improved privacy guarantees” to “k-anonymous” datasets.
K-anonymity is applied as a “preprocessing step, such as before data release and/or before using the data for any potentially nonsecure purpose, like training a deep neural network or machine learning model,” according to a Google statement. This step can be crucial in protecting personally identifying information, such health and medical records, or user information, including passwords and browsing history.
Data is grouped together into “entity clusters” by Google’s system based on shared attributes or references to common entities. After grouping the data, the algorithm finds the “majority condition” for each entity cluster, which indicates which “data item” most or all of the data in the cluster share. The system designates a “data item” to the entity it wishes to anonymize when the majority condition is found, obscuring the shared identifying characteristic.
Without sacrificing the data’s integrity or structure, Google said that this technique enables it to “selectively add or remove relationships in the data to anonymize the data.” According to Google’s submission, this technique offers a substitute for “differential privacy” in datasets, which necessitates significant modifications to the data’s structure.
Building AI
Google has been working very hard to build AI. The firm hopes that its chatbot, Bard, will help it reach two billion users. According to Reuters, the company has included AI into all of its workplace solutions, improved its search engine with AI, combined DeepMind and the Brain Team into one division, and more.
Furthermore, Google has been submitting patent applications for AI advancements almost nonstop in an effort to gain control over everything from spam detection to development automation tools to AI training techniques that save energy.
Google has a significant edge because of its vast data access, as AI algorithms are often data-hungry. This is especially relevant in light of the company’s July privacy policy amendment, which said it has the right to use user posts as raw data to develop its artificial intelligence technologies.
However, as an AI model can only be as good as the data it uses to train itself, it is equally essential that the data be of a high caliber and does not jeopardize user privacy. Furthermore, it is crucial to ensure that the data does not contain any sensitive user information because AI models can be reverse-engineered to provide the same output that they were trained on.
Several AI data privacy solutions are being developed. Microsoft and Oracle recently applied for patents to protect their training data against algorithmic attacks and reverse engineering vulnerabilities. Google’s approach addresses data rather than models before training AI systems. This data minimization method reduces user data leak vulnerability.
Google’s effort in AI is very important as the race to develop AI technology intensifies. Maintaining its dominant position in the business is probably dependent on avoiding data security mishaps with its AI products.