Today I am going to motivate and define a philosophical principle which I shall call "coding invariance". I do not claim to have invented this principle as I can't remember exactly where I got the idea from. I shall also explain its intuitive appeal, give simple examples of its application and explain why certain weakenings of it are not strong enough.
Consider a universe and then consider collecting together all the facts about this universe throughout its history into one large description of that universe. We mean to include here all physical facts concerning the universe such as which electrons appear where at what times (or what the quantum wave function's state is at what times). Other information about the universe may be present in patterns of these facts (laws of physics, larger scale facts about the universe). We shall be considering how these facts about the universe might be represented.
Clearly there is not going to be any one unique way to right down the facts. For instance we might use a "+" symbol to represent a proton but someone else might use a "p" or even a "-". As long as the method chosen doesn't give the same name to protons and electrons (or any two particles) then any naming system could be chosen (practically some may be more useful but philosophically no one could be preferred).
So conceivably we could have two differing descriptions of the same universe. Two different sets of facts may turn out to describe the same universe only using different symbolic conventions.
The coding invariance principle states that two sets of facts should be considered to refer to the same universe if there is a way of recoding one as the other and vice versa. The intuition here is that the method of recording facts about a universe (including the names used and the data storage structures) should not matter. Or put another way "A rose by any other name would smell as sweet".
Now let us look at some simple examples of choices of coding. Here we represent a collection of facts as sequence of 0's and 1's (this can be done for simple collections of facts in simple languages):
These two sequences share the same pattern but the digits used to describe it differ. In this case one can translate just by replacing 0's with 1's.
The rule here is harder to discern. The nth digit is repeated n times in the 2nd coding but only twice in the first coding.
In the 2nd coding 00 is represented as 10 by the second coding but as 00 in the first coding. This last example may seem less reasonable than the others, however, I shall argue that it is necessary to accept it as equally valid as the other examples.
The rules that are used in the first two examples seem reasonable because the description of the rules are relatively simple compared to the complexity of the data coded. We might like to say that some rules are silly because they are too complicated or they seem like overkill for the data they code. This is a reasonable stance practically (for computer science) but it is not easy to defend philosophically because it really just depends on your perspective.
The complexity of the rule set depends on the language you use to describe them or the concept of algorithm you use conceptualize them. The problem is there is no way to choose a canonical language to decide which rule sets are too complex. Therefore all languages must be equally good philosophically speaking and this implies we cannot consistently maintain that there should be a restriction on complexity.
Of course from out perspective codings can seem better or worse but that is really down to us having had an explicit coding choice already made for us (as our life experiences are a particular way of experiencing reality).
So we are left with the coding principle that two collections of facts describe the same universe if they can each be computably recoded as the other.
NB: All examples given in this post were of sequences but this was just for simplicities sake. The principle could be applied to unordered sets of facts too.
Our personal data is being harvested for many purposes these days. Stored in vast databases with little or no encryption organizations from our supermarkets to the security services are mining this data for reasons from commercial gain to national security. How we keep confidential and sensitive information private is much debated. However, important facts concerning cryptography are being ignored in this discourse. Today I want to describe what is possible as a result of the revolution in cryptography that occurred at the end of the 20th century.
Our intuitions tell us that ineffective solutions aside (such as identification by indexed numbers) it is impossible to have both the benefits of anonymity and those of transparency. But this is false. Cryptography can combine benefits of anonymity and benefits of transparency. Pseudo-anonymity is possible and comes in many forms. Without an understanding of these possibilities any discussion concerning privacy will be be missing out on a huge range of potential solutions.
In what follows I will be making a couple of technical assumptions that are not hugely controversial. Firstly I shall assume that certain widely believed mathematical conjectures are true or at least not usefully false. I shall also assume that we are not going to be able to build quantum computers any time soon. Our entire banking transfer system is based on these assumptions so I am in good company in making them!
Various counter intuitive things are possible with modern strong cryptography. A cryptographic signature consists of a private and a public key. Anyone in possession of the private key can sign messages (but no-one else can). Anyone in possession of the public key can check the signature and read the messages. Creating a cryptographic signature is easy with the right software. Cryptographic signatures can be used to create a virtual identity which is hard (or if desired impossible) to tie to the person who created it. However, over time such virtual identities can acquire trust in much the same way that individuals have for millenia.
Networks may be set up which allow anonymous communications to be sent. This allows not only the content of a message but the existence of a message to be hidden. Such networks already exist (for example Tor) and are used in high tech music piracy peer to peer networks.
Protocols are possible in which a certain action (such as decrypting a document) are only possible if certain people agree to the operation. As an example it is possible to so encrypt a document so that any 3 people out of 5 key holders can decrypt it but that no 2 people can decrypt it acting alone.
A canary is a characteristic piece of data which identifies the source of a document. The ordinance survey include minor errors dotted over their maps. This allows them to detect any may which has been copied from an ordinance survey map. Canaries can be added to many types of data to help identify unauthorized copies (although their utility is restricted to situations where few people have access to data).
No one has absolute privacy. There are many ways in which your privacy may be intentionally violated. For instance private detectives can be employed to follow you, public records mined for information or bribery used to obtain sensitive information. We shouldn't try to get any absolute guarantees of privacy because we know it to be impossible. In practice maintaining privacy is a matter of raising the cost of violating privacy to the extent that it is not worth the effort for the eavesdropper.
What matters is the cost of access to private data, the people who can access it, how easy it is to trace them and how susceptible the data is to abuse. The problem with storing masses of credit card details in centralized databases is not that the information needs to be private but that the cost of steeling each record is lower by a kind of mass-stealing economy. If only 2 or 3 people have access to a confidential file and anonymous blackmail threats are made then there is already a ready made shortlist of suspects available. If furthermore there are tell tail mistakes in the blackmailer's threat (because canaries have been used) then the perpetrator may be identifiable. Finally ease with which data can be abused matters. There are sometimes alternative methods to store information and some may be less prone to abuse than others.
So taking all this into account we should worry when:
1) Data is stored in central databases
The more data in one place the cheaper it is to illegally access that data (per record)
2) This Data is in a computer readable format
Working through masses of data by hand takes many more resources than if that process can be automated. As an example a supermarkets credit card database is computer readable but google street view is not (in any useful way)
3) Data is in abusable format
Records of transactions are necessary but these records can be stored in a manner which doesn't expose people's bank accounts to fraud.
4) The data is sensitive
Data about what music you like is not as open to abuse as data identifying which whores you've been visiting.
5) Many people have access to the data
It stands to reason that the more people have access to records the easier it is to trick/bribe them and the more likely it is that there are bad apples.
6) Its hard to trace the source of a leak
Clearly the easier it is to identify abuse the easier it is to discourage it.
7) The value of the data is high
A database containing movie preferences is much less valuable than one containing details of police cautions. The second one needs much better protection than the first.
I will now give some examples of what cryptography could be used for. Firstly it is possible to have electronic voting systems which are private but for which everyone involved in the process can count the votes themselves. Unfortunately no system that is currently in operation uses the necessary technology. Hence I am not against electronic voting in principle but I oppose all systems currently used.
Secondly any interaction that can be thought of as a sort of game with hidden information (such as a game of poker or a financial transaction) can be implemented using cryptography is such a way that the information can be hidden (such as the face of the card) the and yet when it is revealed the information is still known to be correct (revealing your hand).
Thirdly identity cards are possible which allow you to prove that you are a member of some group (such as non-terrorists or over 18s) without identifying who you actually are. It is possible to do this without making the system more prone to abuse by terrorists or underage drinkers! Please note that the UK governments proposals for ID unbelievably do not use such technology.
The benefits of modern cryptography then are (A) that pseudo anonymity is possible are can be used to prove facts such as your age, your criminal status without revealing any other information (B) that signature schemes allow proofs of transactions without increasing the risk of fraud (C) that cryptographic protocols are stricter instruments of public policy than laws in that they can (subject to our assumptions) be mathematical proven to prevent abuses. One of the many failings of modern liberal democracies is a failure to put our understanding of cryptography to work to provide these benefits and a failure to recognize the need for cryptographic solutions to provide privacy for the public, data for the government and intelligence for the police.
There are problems with cryptography too though. Cryptographic protocols take time to perform but as computers get faster this objection becomes weaker and weaker. Cryptographic protocols are brittle and not easy to adapt to new usage patterns. I think that it is better to live with this than to risk the massive privacy violations that will occur without it.
Sorry there was no post last weekend. This weeks post should be up soon and is likely to be about the false dichotomy between anonymity and transparency. Otherwise there are further posts on vegetarianism, utilitarianism and corruption as a universal sociological/biological law.