Suppose you're working on some problem that requires a few trillion computations for a few hours. So you dialup your local supercomputer and have them run your program for you. Unfortunately for you, the local supercomputer company is owned by an industrial spy who sees what you're doing and steals all your data. What are you gonna do? The damage is already done, it's not like the law is going to fix anything.
This problem wouldn't occur if you needed a few terabytes for a few hours. You'd just encrypt all the data going in, and decrypt it as it comes out. Same thing if you just needed a few terabits of networking capacity (which is just a special case of storage).
What we need is an encrypted machine language. We need a way to encrypt a program so that the machine that executes it can never know the purpose of what it was executing. The program could never be decompiled without the secret key.
I think programmable gate arrays might conceivably enable something like encrypted computation.
The idea here is that you have a block of input
[ k i l l _ ]
[ t h i s _ ]
[ d u m b _ ]
[ u s e r _ ]
along with a function P from input to output
[ KABANG ]
which would yield
[ d e a d _ ]
[ d u m b _ ]
[ u s e r _ ]
[ h e r e _ ]
but you employ one
asymmetric function F which transforms the input into
[ 0 1 20 02 4 ]
[ 8 84 9 29 2 ]
[ 1 29 4 49 1 ]
[22 20 0 1 0 ]
and a related asymmetric function F' which transforms the function P (ie, "program") into
[ GTTAS934 ]
which when applied to the input yields
[ 0 a 9 2 x ]
[ 8 8 i 5 2 ]
[ 1 23 4 18 3 ]
[78 33 3 7 a ]
which only the holder of F^-1 can decrypt. Of course, possession of F^-1 lets you create F'^-1 which lets you crack the program P.
What the people working in DRM want and which is pretty much known impossible is a function F' independent of F, or better yet a function F2 which decrypts the output but not the input. This is ridiculous. See http://bitconjurer.org/wild_cryptographic_conjectures.html
. The actual references:
One of the weaknesses of the paper is that they don't consider relaxing the Functional condition. They're focused on DRM schemes (where F' must be independent of F) so they can't afford to relax it but it can be relaxed.
Having established that, for encrypted computation to work, F'(P) has to yield a program P' which does something
when the right kind of seeming gibberish is fed to it. It also has to look like gibberish and a singular change in P must change half of P'. Traditional CPUs obviously will never cut it but logic arrays might.
There's something called Homomorphic Encryption which is related to what you describe here. http://www.tcs.hut.fi/~helger/crypto/link/public/homomorphic.html
Unfortunately the above link is "Access denied! You have just tried to access a document that is for internal use only..." (at least in September 2009).
This sounds like an idea I had some time ago about using public search engines to index private documents: The content would be encrypted on a word-by-word basis. After indexing, you could simply use the public search engine interface to search for encrypted keywords. But even if the encryption scheme is known to be secure, I wonder if an adversary could deduce some information by statistical means. What do you think? -- AntonioTaylor?
Ummm, I can't see the correspondence. You're describing a common storage encryption scheme subject to a common type of attack, just in a novel setting.
As for attacks? Well, most encryption schemes have problems with short strings like individual words. And if each block is known to be one of the few thousand words in the English language, that causes a big problem. But if one word maps to any number of blocks (due to the injection of noise) then you should be completely safe.
I regard it as a possibly computation(a database query) that is not uncommon to outsource, secured against the party computing the result: a form of EncryptedComputation
. If I was only concerned about encryption, I would encrypt blocks of words rather than individual words, but then the other party would not be able to compute the desired result. -- AntonioTaylor?
I wish you were right, but the relation
decrypt(encrypt(operation(A, B)) == de(op(en(A), en(B))
isn't preserved. In order to make the encryption scheme secure, and to prevent building a dictionary, it's necessary to inject noise into the blocks so that encrypt(A) ~= encrypt(A). If you preserve this relation, and you limit your plaintext to short strings, and you limit them to
words, well then it becomes possible to use statistical attacks to build up a dictionary of words to their encrypted counterparts. Which doesn't break the encryption scheme but it does completely bypass it.
That's exactly my point. If I'm encrypting word-by-word, the relation holds true for the database queries, but an adversary could deduce information. But you reminded me of the possibility of inserting noise words between the actual data. That would still allow simple queries on the encrypted data, while phrase queries would have to be transformed or might become impossible, depending on the existence of wildcards in the query language. It would also make statistical analysis more difficult, but not impossible. I suppose bypassing the encryption scheme this way is not only possible in this example, but is the crucial thing that makes EncryptedComputation
theoretically worthless. -- AntonioTaylor?
Re: "is the crucial thing that makes EncryptedComputation
theoretically worthless" -- no, it's not, but it's certainly true that word-at-a-time encryption is not very strong, and can always be broken given sufficient encrypted text.
On the other hand, the strongest transformational block cryptography schemes have every bit of the output depend equiprobably on every bit of the input (adding noise, so that the output block size is larger than the input block size, usually helps, but is secondary, not primary). EncryptedComputation
lines is not disproven by above arguments. -- DougMerritt
See also BlackBoxComputation
(especially the last part)