Sovereign Joins
arch Center
The University of Texas at Dallas
University of California, Berkeley
Abstract
We present a secure network service for sovereign infor-
mation sharing whose only trusted component is an off-the-
shelf secure coprocessor. The participating data providers
send encrypted relations to the service that sends the en-
crypted results to the recipients. The technical challenge in
implementing such a service arises from the limited capa-
bility of the secure coprocessors: they have small memory,
no attached disk, and no facility for communicating directly
with other machines in the network. The internal state of an
ongoing computation within the secure coprocessor cannot
be seen from outside, but its interactions with the server can
be exploited by an adversary.
We formulate the problem of computing join in this
setting where the goal is to prevent information leakage
through patterns in I/O while maximizing performance. We
specify criteria for proving the security of a join algorithm
and provide provably safe algorithms. These algorithms
can be used to compute general joins involving arbitrary
predicates and multiple sovereign databases. We thus en-
able a new class of applications requiring query processing
across sovereign entities such that nothing apart from the
result is revealed to the recipients.
1
Introduction
Conventional information integration approaches, as ex-
emplied by centralized data warehouses and mediator-
based data federations, assume that the data in each
database can be revealed completely to the other databases.
Consequently, information sharing across autonomous enti-
ties is inhibited due to condentiality and privacy concerns.
The goal of sovereign information sharing [2, 3, 8] is to en-
able such sharing by allowing queries to be computed across
sovereign databases such that nothing apart from the result
is revealed. The computation of join of sovereign databases
in such a manner is referred to as sovereign join. We cite
below two motivating applications of sovereign joins [3]:
Security
For national security, it might be necessary to
check if any of the airline passengers is on the watch
list of a federal agency [21]. Sovereign join may be
used to nd only those passengers who are on the list,
without obtaining information about all the passengers
from the airline or revealing the watch list.
Healthcare
In epidemiological research, it might be of
interest to ascertain whether there is a correlation be-
tween a reaction to a drug and some DNA sequence,
which may require joining DNA information from a
gene bank with patient records from various hospi-
tals. However, a hospital disclosing patient informa-
tion could be in violation of privacy protection laws,
and it may be desirable to access only the matching
sequences from the gene bank.
1.1
Desiderata
A system offering sovereign join service has the follow-
ing desirable attributes:
The system should be able to handle general joins in-
volving arbitrary predicates. The national security ap-
plication cited above requires a fuzzy match on pro-
les. Similarly, the patient records spread across hos-
pitals may require complex matching in the healthcare
application.
The system should be able to handle multi-party joins.
The recipient of the join result can be a party different
from one of the data providers.
The recipient should only be able to learn the result of
the join computation. No other party should be able to
learn the result values or the data values in someone
elses input.
The system should be provably secure. The trusted
component should be small, simple, and isolated [4].
1.2
Problem Addressed
We present a secure network service for sovereign in-
formation sharing whose only trusted component is a se-
cure coprocessor [15, 26, 32]. IBM 4758 cryptographic co-
processor [17] is an example of a commercially available,
tamper-responding secure coprocessor.
The technical challenge in implementing such a service
arises from the following:
Secure coprocessors have limited capabilities. They
rely on the server to which they are attached for disk
storage or communication with other machines. They
also have small memory (e.g. 4MB in IBM 4758).
The factors constraining the memory size are cost and
heat dispensation. The trend towards consolidating the
secure coprocessor functionality on a single chip also
constrains the amount of memory as larger memories
reduce the yield.
While the internal state of a computation within the
secure coprocessor cannot be seen from outside, the
interactions between the server and the secure copro-
cessor can be observed.
Simply encrypting communication between the data
providers and the secure processor is, therefore, insuf-
cient. The join computation needs to be carefully orches-
trated such that the read and write accesses made by the
secure coprocessor cannot be exploited to make unwanted
inferences.
Careful orchestration of join computation in the face of
limited memory has been a staple of database research for a
long time. The goal in the past, however, has been the mini-
mization of I/O to maximize performance. While the I/O
minimization is still important, avoiding leakage through
patterns in I/O accesses now becomes paramount.
1.3
Related Work
In principle, sovereign information sharing can be imple-
mented by using techniques for secure function evaluation
(SFE) [13, 31]. Given two parties with inputs
and
re-
spectively, SFE computes a function
such that the
parties learn only the result. SFE techniques are considered
to have mostly theoretic signicance and have been rarely
applied in practice, although some effort is afoot to change
the situation [22].
To avoid the high cost of SFE, the approach taken in [3]
was to develop specialized protocols for intersection, inter-
section size, equijoin, and equijoin size. Similar protocols
for intersection have been proposed in [8, 16]. A new inter-
section protocol has been recently proposed in [10]. How-
ever, the protocols provided in [3] have the following short-
comings: (1) It is not clear how to extend them to operations
involving general predicates as they are hash-based. (2) It
is not obvious how to extend them to efciently handle a
large number of parties. (3) They leak information. For ex-
ample, the equijoin size protocol leaks the distribution of
duplicates; if no two values have the same number of dupli-
cates, it can also leak the intersection.
Secure coprocessors have been earlier used in a vari-
ety of applications, including secure e-commerce [33], au-
ditable digital time stamping [30], secure ne-grained ac-
cess control [12], secure data mining [1], and private in-
formation retrieval [5, 28]. See [27] for a taxonomy of se-
cure coprocessing applications. The techniques developed
therein though are quite different. Note that the capabilities
provided in the architectures such as Trusted Computing
Groups trusted platform module [29], while complemen-
tary, do not solve our problem.
1.4
Paper Layout
The rest of the paper is organized as follows. In Sec-
tion 2, we specify the adversarial model and give the sim-
plifying assumptions and notations. In Section 3, we illus-
trate using classical nested loop some of the subtleties of the
problem. This investigation enables us to distill the design
principles underlying the proposed algorithms. We also de-
ne the correctness criteria for proving the safety of the join
algorithms.
In Section 4, we provide two provably safe algorithms
for general join in which the matching predicate can be an
arbitrary function. They offer a range of performance trade-
offs under different operating parameters.
Section 5 is devoted to the study of equijoins. Surpris-
ingly, adaptations of classical sort-merge join or hash join
turn out to be unsafe. We then provide a safe algorithm.
In Section 6, we analyze the performance characteristics
of the proposed algorithms. We conclude with a summary
and directions for future work in Section 7.
2
Preliminaries
This section species the adversarial model and our sim-
plifying assumptions and notations.
2.1
Adversarial Model
Our computing model admits any number of data
providers and result recipients. Without loss of generality,
we will consider the case where two parties
and
that have private relations
and
are participating in the
sovereign join operation and the result
!
is sent to the party
#"
, which is not
or
. We assume that the join algo-
rithms and the join predicates are known to the parties.
The server
$
, offering sovereign information sharing, is
a general purpose computer. A secure coprocessor
%
is
attached to
$
. The only trusted component is the secure
coprocessor. All other components, including
$
, are un-
trusted. We assume that no party (including
$
) can observe
the state of the computation inside
%
or tamper with the
code loaded into it.
Communication between
%
and
,
, or
"
is en-
crypted. Similarly, any temporary value output by
%
to
$
is
also encrypted.
2.2
Authenticated Computation
Given that nothing but
%
is trusted, we have the chal-
lenge of validating the authenticity and protecting the se-
crecy of the computation done by
%
.
We use the remote attestation mechanism provided by
the secure coprocessor to ensure that it is indeed executing
a known, trusted version of the application code, running
under a known, trusted version of the OS, and loaded by a
known, trusted version of the bootstrap code [12].
We assume that
and
have signed a digital con-
tract [12] prescribing w