February 2019 - Data Protection & Privacy | Blockchain | GDPR

Data Protection and Blockchain

Data protection plays a major role in many blockchain-based project ideas, and presents issues which are not always easy to resolve, as Stephan Zimprich, Leader of the eco Competence Group Blockchain, explains.

© tampatra | istockphoto.com

Blockchain is a technology which enables the protection of data against manipulation. So, in this sense, it increases the security of data. However, simply put, this security is achieved by making the records saved in the blockchain transparent and immutable; and this, in turn, is achieved through the redundant and distributed storage of each record at multiple nodes throughout a large network. If we consider the requirements of the EU General Data Protection Regulation (GDPR), the very essence of the security of blockchain is therefore in contradiction with the privacy required for the protection of personal data. As a result, the development of a blockchain project needs to include careful examination of what kind of data is being stored, and whether that data could be considered to be personal data.

Applicability of data protection law to blockchain

When we consider the basic applicability of data protection law in a blockchain environment, the first question we need to ask is whether data processing is involved. As a rule, the answer is yes. The second question is then whether personal data comes into play. If it is possible to answer this question with a no, then there are no further privacy issues. You can tick all the boxes, and know that the project is not subject to data protection law. 

However, if you answer this question with a yes, then the next step is to look at whether the project has a relevance to the EU – as a rule, this is the case for projects that are founded and based in the EU as well as for projects that collect data from within the EU. 

Then you are within the scope of the GDPR, and data processing is forbidden unless you have a legal basis for the processing of data. It is important to understand that the GDPR contains the very paradigmatic principle that data processing is generally prohibited unless certain exceptions make it permissible. 

What is considered data processing? 

When are data personal, and who is then responsible for the data processing? These are questions that are not so easy to answer when we look at blockchain. 

If we break these questions down somewhat, it is perhaps best to initially consider what data processing entails, because the term in itself is incredibly broad: every handling of data which occurs automatically or in a structured form is to be included in the term data processing. The only thing that is not considered data processing is when I talk to you face-to-face. This effectively means that every flow of data in the operation of a blockchain is included in the term “data processing”. This is the case for the submission of transactions through nodes; it’s the case for the storing of hashes, the transferal of transaction data by the miners and the verification process, and also in the case of the saving and synchronization to the blockchain as a whole. The blockchain must be regularly updated among all participants of the network, so that everyone has the most up-to-date version of the complete database – this update of the database across the network is also considered data processing. This means that all participants in the operation of a blockchain – not just the transaction partners and the miners, but also the nodes - are involved in the data processing. 

When is data personal?

Now it starts getting a bit more difficult. According to the GDPR, personal data includes all information that refers to an identified or identifiable natural person. An identified person is relatively simple: my name, or an email address that includes my name; a finger print, perhaps a photo of the face, and so on – these are immediate identifiers. Identifiable is a bit more complicated. Here, the immediate identifiability is set aside, and information that third parties have becomes relevant. To understand what kind of third-party knowledge falls within this scope, the question is whether the identity can be ascertained with a proportionate amount of effort with the means available to the processing party or any other person. Factors for this include the cost of identification, the time required for available technologies, and technological development, which is always changing. 

This could, for example, be the IP address. Are IP addresses personal data? The European Court of Justice has now answered this question, maintaining that an attribution is possible for an ISP, given that, at least for a short period of time, there is the possibility of attributing an IP address to a customer via that customer account. 

For Cookie IDs, the question has now also been answered. The French Data Protection Authority recently took a decision in conjunction with an ad tech company. This company had collected location data via mobile phones and used the Mobile Advertising IDs built in to mobile devices to achieve this. This is, as an application case, very similar to Cookies. Therefore, we can now deduce that Cookie IDs that are enriched with further information – traffic data or metadata – are also personal data. 

The key question which arises in relation to blockchain relates, in the first place, to public keys. Take public keys in Bitcoin: do they entail “personal reference”? 

There are so many people who, for example, publish their public key on their Facebook profile and ask for donations in Bitcoin. In this case, of course, there is immediately a connection to the Facebook profile. And given that I will not be able to check every single public key, and I cannot exclude the possibility that one of the owners has made theirs public at some stage in the past, I need to assume that all public keys represent personal data. 

When do blockchains have personal reference? 

A further differentiation we can make is to look at personal reference in terms of the type of blockchain. In the case of public blockchain, we have the example of the published public key from Bitcoin – and in this case, that will be classified as personal data. In the case of a private blockchain, it looks a bit different. Here, of course, we have a limited user group. It is always necessary to look at what role the participant currently plays. If I am internal – so, if I’m a participant in this private blockchain – then it depends on the use case. If I am an external participant, this is not necessarily the case. If I, as an external third party, look at this private blockchain, then I cannot necessarily assume that it involves personal data. And then there is also the payload in the blockchain. I can transfer data as payload in the blockchain, and here it depends on what this payload is – and the payload can of course be personal data. If I save this in the database unencrypted, it is open for all. If I save it in encrypted form, then whether or not I have to assess this as personal data is a question of how easy it is for me to link this back to the actual person. 

Here, we can make a differentiation between the roles. Miners must in some circumstances be judged differently from the simple user who is looking in from the outside. Intermediaries – the virtual currency exchange, for example, or the surrounding systems that are needed for the operation of a blockchain – often have a connection. Such a currency exchange has KYC (know your customer) obligations, and it is also possible that they will have to fulfill further obligations regarding money laundering. As a result, before I can register for an account with the currency exchange, I must produce some kind of ID, such as my passport. In this case, the possibility to connect a blockchain identity with a real identity definitely exists. So, from this perspective, we will always need to say that even pseudonymous data within a blockchain represents personal data. 

Who is the actual Data Controller for the data processing?

The next series of questions concern who is actually responsible for the data processing, and who therefore needs to prove that they have legal grounds, and who is responsible for ensuring compliance? This leads to the following subset of questions: Who needs to maintain the documentation? Who potentially needs to conclude data processing contracts with contractors? Who needs to make contracts with data recipients, and so on. This includes a whole range of obligations – among others, the appointment of a Data Protection Officer, if you fulfill certain prerequisites.

If we look at the different roles in the Public Blockchain, let’s consider who is engaged in data processing, and who will take responsibility as a data controller:

  • The developer? No, the developer is not involved in data processing. The developer simply produces the code that we can use, which still needs to be brought to life. 
  • The initiator of a transaction? Yes, this is someone who processes data. Making use of a destination address for a Bitcoin transaction, for example, is an act of data processing. 
  • For the miner and the node operator, it is a little contentious. Some assume that this is a form of contracted data processing, whilst others assert that these roles are synonymous with being data controllers – that is, they are themselves responsible, because they are all doing what they do for their own business purposes. 

This eventuality is provided for in the GDPR. It makes use of the term “joint data controller” and attaches to this concept the legal consequence of needing to conclude a multilateral contract between all joint controllers. When you imagine Bitcoin from this perspective, it becomes a little absurd. We have literally hundreds of thousands of participants, there are thousands of nodes, and countless miners. They would all need to be connected through a multilateral contract which regulates who has which roles and which responsibilities with regards to the data processing of one individual player. It is hard to imagine, but it could perhaps be taken into the T&Cs, which can simply be clicked on to accept when an individual gets involved as a miner or a node. But it nevertheless leaves us with a lot of legal problems.

For the private blockchain it is somewhat simpler. Again, the developer is not involved in data processing. The governance structure will act as the data controller. Here also, the initiator of a transaction is a data controller, and the nodes – who then simply support the infrastructure on behalf of this blockchain – would clearly be contracted data processors. 

Types of permissibility

As already mentioned, we always need a legal basis, or else the processing of personal data is prohibited. What grounds do we have for permissibility? In principle, we have three possible types of permissibility that can be considered: Consent, contract fulfillment, and legitimate interests, which requires a balancing of interests of the party processing the data and the person whose data is processed. 

We really won’t get far with consent in the blockchain environment. The declaration of consent requires that the data subject is informed about to whom the data will be transferred. Let’s just look at one example where we can see that this is doomed to failure: I cannot identify to which nodes, and to which miners my transaction data will perhaps be transferred. It simply cannot be foreseen who in future may be involved in the network. I can also not predict who may simply have a look at the transaction data via a view function in the interface. Consent is probably not a path we can negotiate for public blockchain.

Contract fulfillment could be a possibility, at least for certain participants, namely for the follow-on transactions that this initial transaction triggers, and for those that receive it. Here, for example, the processing of data can be justified on the grounds of a sales contract to be paid in Bitcoin – and then of course the recipient of the transaction is permitted to see the corresponding data, and the initiator of a transaction is also allowed to. But this does not include the nodes or the miners. 

As a result, in almost all cases, we will be dependent on demonstrating legitimate interests. Here, the balancing of interests is always required. And as long as only the data of participants is processed, we can go a long way with this. Here there are also obligations to inform which, although they may not be so easy to fulfill, are nonetheless manageable. But, to re-emphasize, this only applies in instances where purely the data of the participants is processed. As soon as I designate as payload third-party data from people that have nothing to do with the operation of the blockchain or the transaction, it becomes more difficult to argue legitimate interests. From a data protection perspective, this is certainly not a trivial issue. 

Data minimization

The principle of data minimization means that data can only be processed as long as it is appropriate for the purpose and it must be limited to the minimum necessary for this purpose. There are rules called Privacy by Design and Privacy by Default. Privacy by Design means that, already during the development of a system, it must be ensured that as little data as possible is required, and that this data is handled in as data-protection friendly a manner as possible. The data-protection friendly default setting also means that the system must be built so that anything that is not absolutely required from the user is actively consented to by the user. 

Here, we have the potential for conflict with blockchain. The redundant data storage in a distributed network fundamentally contradicts the principle of data minimization. The principle of blockchain is to distribute as widely as possible. This is a security aspect for blockchain. This is not particularly welcome from a data protection perspective. Openness and transparency are also problems for Privacy by Default and Privacy by Design, which represent quite the opposite precepts.

There are a range of ways of approaching the situation, depending on the use case. Under some circumstances, it is possible to build systems so that no identifying data is in the blockchain. It is possible, for example, to build in access limitations such as those in payment channels. Here, multiple transactions are merged off blockchain, and then only one overall transaction is written into the blockchain, so that attribution of the individual transactions is no longer possible. For ConSozial Blockchains certain permissions can be given about who can read or write what, and when.

But what we must remember is that the blockchain use case does not in itself provide justification for the distribution of data. This always needs to be viewed in conjunction with the potential risks. And here, I mean the risk for the data subject – so the data protection risk. This means that if you have a use case that contains medical data, justification for using a public blockchain solution will be more difficult than a relatively trivial use case like cryptocurrency transactions. 

The right to deletion or the right to be forgotten

With the right to deletion and the right to be forgotten, we have a major problem with blockchain. Blockchain simply does not provide for the option that anything should be deleted – this is also a security feature of blockchain. We specifically do not want anything to be deleted, because that would represent a manipulation. Data protection law sees this differently: Everyone must have the right to be deleted from publicly accessible registers, databases, and so on.

The immutability of blockchain, as I mentioned, is one of its security mechanisms. So how can this be achieved? Zero-Knowledge Proofs can be one method. If only assignment data is written into the blockchain, and then the link to the off blockchain data is broken, this is a good argument that something like deletion has taken place. As soon as the personal data itself is written into the  blockchain, for example in the form of Public Keys, it is problematic. 

Transfers to third states

Then, we have transfers to third states, which is also a big topic for public blockchain and which I will just briefly touch on here. Anyone can participate, which means that people from outside the EU are also involved. If we assume that a public key is personal data, then – again using Bitcoin as an example – the operation of or participation in the blockchain is always connected with an international data transfer. The international data transfer requires a safeguarding of the international data transfer according to data protection law. There are several possibilities: There are the standard contractual clauses of the European Commission. It may be possible to implement these as T&Cs, so no signature is needed. Then, we have the EU-US Privacy Shield. If the transfer is going to the USA, then the recipient in the USA must be registered and must have committed to these principles, so this also represents legal uncertainty. Then you would need to use the T&Cs or some kind of standard contract to obligate all participating nodes and miners to agree to certain contractual commitments. 

Achieving data protection compliance in blockchain projects

As we have seen, there are a number of issues when we combine blockchain and potentially personal data. Although blockchain technology offers the advantages of transparency and immutability, it is exactly these characteristics that can lead to conflicts with data protection law. The developers of blockchain projects should therefore carefully analyze the kind of data intended to be stored in the blockchain, and weigh up the advantages and disadvantages of the type of blockchain to be used. Certainly, the principles of data minimization and mechanisms for ensuring the anonymization of personal data are essential elements to consider.


Stephan Zimprich is a lawyer in the intellectual property and media team of Fieldfisher's Hamburg office with six years of experience in advising clients, ranging from start-up size to multinational market leaders in the fields of copyright, media and broadcasting regulation, and data protection. The main focus of his work lies in the area of digital content distribution and data-driven business models such as targeted advertising and mobile advertising. He has a particular expertise in the online travel sector, where he advises international clients from Europe and the US in the fields of data protection, advertising, and travel regulation, as well as general commercial law, including cross-border co-operations.