The AI Privacy Dilemma

Rainer Sträter

Head of Global Platform Hosting

1&1 IONOS SE

Dr. Leif-Nissen Lundbæk

CEO & Co-Founder

Xayn AG

"Will AI enslave us all?” or “Will AI save humankind?”. The discussion around AI seems to oscillate only between these two extremes. This is good news for click-baiting but hinders a realistic debate about the real challenges. The situation is far more complex because AI has the potential to be both – or as Sune H. Holm, associate professor of philosophy at the University of Copenhagen, put it: “On the one hand, we can use these systems to prevent serious harms to people and society. On the other hand, using these systems seems to threaten important values such as privacy and autonomy.”

What is the AI Privacy Dilemma?

Are there technical solutions to the AI Privacy Dilemma?

So far, the norm has been to collect data from the devices where it was originally created to process it centrally. For privacy reasons some form of data anonymization method – such as differential privacy – might be applied, modifying data slightly before processing it. This is used widely because it is relatively easy to implement.

However, by combining different databases, personal data can be recalculated and while anonymization technologies are improving constantly so are deanonymization technologies. Therefore, this is one of the methods with a lower privacy rating.

To really reconcile AI and privacy, we should question the entire process of collecting and processing data centrally and think about different approaches – e.g., edge computing and especially masked federated learning. Here, the algorithms are brought to the end devices where the data is created. These local models are encrypted and sent to an aggregator, which is the basis for subsequent local learning. All personal raw data stays on device level at all times, only updated models get communicated in the network in an encrypted way. This way, privacy is built into the process right from the start. Masked federated learning combines the best of both worlds: a highly efficient and privacy-protecting AI.

XayNet federated learning — Fig. 1 ©Xayn AG: Our PET solution for federated learning (1) “update” participants send their masked local models to XayNet and their seed of the corresponding mask – via XayNet in (2) and (3) – to “sum” participants. XayNet receives the sum of masks from “sum” participants (4), uses that to compute the aggregation of masked local models, and (0 & 5) sends this new global model to a Model Distributor which passes it on to relevant parties of the use case.

Also, large amounts of data don’t have to be shipped anymore and decentralized systems are more resilient against attacks. Potential attackers would have to focus on an enormous number of different devices at the same time to succeed – a highly unlikely event. Therefore, this decentralized approach can accommodate data-protection directives such as the GDPR with greater ease, scalability, and cost-effectiveness.

One example of an application using masked federated learning is the recently launched European search engine Xayn, which combines personalized search with data privacy. It uses a proxy to hide user information from the various indices, and tailors the results directly on device level to the preferences of the individual users. This way, users receive personalized search results and can actively train the underlying AI models to their needs directly on their device, while their data stays on their devices.

Privacy-proofing a company’s infrastructure

Being privacy-proof to become future-proof

In public debates, privacy is sometimes seen as a great obstacle for development in Europe. Some politicians even argue that we should compromise on data protection to enable new technology. Some companies support this claim by stressing that they wouldn’t be able to provide their services without collecting large amounts of data – but that is only part of the truth. In fact, in many cases the services themselves don’t depend on the amount of data collected, but rather the business model of the company.

This is especially relevant as we create more and more data with every minute that passes – and this trend will only continue. The official EU data strategy projects an increase of 530% in the global data volume, up from 33 zettabytes in 2018 to 175 zettabytes in 2025. The same study also suggests that even though in 2018 80% of all data was processed centrally, by 2025, 80% of data will be processed on device level.

In addition, consumers exert more bottom-up pressure on companies to make sure that they protect the data of their customers – as demonstrated recently by the mass exodus from WhatsApp to privacy-protecting Signal. EU lawmakers should seize the moment, because “the EU’s strong emphasis on the more social, ethical, and consumer-friendly direction of AI development is a major asset. But regulation alone cannot be the main strategy. For Europe’s AI ecosystem to thrive, Europeans need to find a way to protect their research base, encourage governments to be early adopters, foster Europe’s startup ecosystem (...), and develop AI technologies as well as leverage their use efficiently”.

Corporations should react to this bottom-up pressure from consumers and the top-down policies by lawmakers proactively. The sooner companies start to privacy-proof their technology and infrastructure, the more future-proof they will become.

Rainer Straeter is Head of Global Platform Hosting at IONOS. In this role he is leading the international IaaS technology division as well as the international technology teams at IONOS’s subsidiaries Fasthosts and Arsys. In 2015, he was responsible for IONOS’s successfully launched next generation cloud platform in EU and US. Since 2019, he has held a seat in several committees and working groups within Gaia-X.

Leif-Nissen Lundbæk (Ph.D.) is Co-Founder and CEO of Xayn and specializes in privacy-preserving AI. He studied Mathematics and Software Engineering in Berlin, Heidelberg, and Oxford. He received his Ph.D. at Imperial College London.

Please note: The opinions expressed in Industry Insights published by dotmagazine are the author’s own and do not reflect the view of the publisher, eco – Association of the Internet Industry.