Lattice duality: The origin of probability and entropy
lack height=1 colspan=2>
Lattice duality: The origin of probability and entropy
Neurocomputing 67 (2005) 245274
Lattice duality: The origin of probability
and entropy
Kevin H. Knuth
NASA Ames Research Center, Mail Stop 269-3, Moffett Field, CA 94035-1000, USA
Received 6 April 2004; received in revised form 25 August 2004; accepted 18 November 2004
Available online 13 June 2005
Communicated by S. Fiori
Abstract
Bayesian probability theory is an inference calculus, which originates from a generalization
of inclusion on the Boolean lattice of logical assertions to a degree of inclusion represented by
a real number. Dual to this lattice is the distributive lattice of questions constructed from the
ordered set of down-sets of assertions, which forms the foundation of the calculus of inquiry
a generalization of information theory. In this paper we introduce this novel perspective on
these spaces in which machine learning is performed and discuss the relationship between these
results and several proposed generalizations of information theory in the literature.
Published by Elsevier B.V.
Keywords: Probability; Entropy; Lattice; Information theory; Bayesian inference; Inquiry
1. Introduction
It has been known for some time that probability theory can be derived as a
generalization of Boolean implication to degrees of implication represented by real
numbers
[13,14]
. Straightforward consistency requirements dictate the form of the
sum and product rules of probability, and Bayes theorem
[13,14,49,48,22,36]
, which
forms the basis of the inferential calculus, also known as inductive inference.
ARTICLE IN PRESS
www.elsevier.com/locate/neucom
0925-2312/$ - see front matter Published by Elsevier B.V.
doi:10.1016/j.neucom.2004.11.039
肨el.: +1 650 604 4279; fax: +1 650 604 4036.
E-mail address: kevin.h.knuth@nasa.gov.
However, in machine learning applications it is often times more useful to rely on
information theory
[47]
in the design of an algorithm. On the surface, the connection
between information theory and probability theory seems clearinformation
depends on entropy and entropy is a logarithmically transformed version of
probability. However, as I will describe, there is a great deal more structure lying
below this seemingly placid surface.
Great insight is gained by considering a set of logical statements as a Boolean lattice.
I will show how this lattice of logical statements gives rise to a dual lattice of possible
questions that can be asked. The lattice of questions has a measure, analogous to
probability, which I will demonstrate is a generalized entropy. This generalized
entropy not only encompasses information theory, but also allows for new quantities
and relationships, several of which already have been suggested in the literature.
A problem can be solved in either the space of logical statements or in the space of
questions. By better understanding the fundamental structures of these spaces, their
relationships to one another, and their associated calculi we can expect to be able to
use them more effectively to perform automated inference and inquiry.
In Section 2, I provide an overview of order theory, specically partially ordered
sets and lattices. I will introduce the notion of extending inclusion on a nite lattice
to degrees of inclusion effectively extending the algebra to a calculus, the rules of
which are derived in the appendix. These ideas are used to recast the Boolean algebra
of logical statements and to derive the rules of the inferential calculus (probability
theory) in Section 3. I will focus on nite spaces of statements rather than continuous
spaces. In Section 4, I will use order theory to generate the lattice of questions from
the lattice of logical statements. I will discuss how consistency requirements lead to a
generalized entropy and the inquiry calculus, which encompasses information
theory. In Section 5 I discuss the use of these calculi and their relationships to several
proposed generalizations of information theory.
2. Partially ordered sets and lattices
2.1. Order theory and posets
In this section, I introduce some basic concepts of order theory that are necessary
in this development to understand the spaces of logical statements and questions.
Order theory works to capture the notion of ordering elements of a set. The central
idea is that one associates a set with a binary ordering relation to form what is called a
partially ordered set, or a poset for short. The ordering relation, generically written
p, satises reexivity, antisymmetry, and transitivity, so that for elements a, b, and c
we have
P1:
For all a;
a
pa 餜eflexivity,
P2:
If a
pb and bpa; then a b 餉ntisymmetry,
P3:
If a
pb and bpc; then apc 餞ransitivity.
ARTICLE IN PRESS
K.H. Knuth / Neurocomputing 67 (2005) 245274
246
The ordering a
pb is generally read b includes a. In cases where apb and a
ab, we
write a
ob. If it is true that aob, but there does not exist an element x in the set such
that a
oxob, then we write a
b, read b covers a, indicating that b is a direct
successor to a in the hierarchy induced by the ordering relation.
This concept of covering can be used to construct diagrams of posets. If an
element b includes an element a then it is drawn higher in the diagram. If b covers a
then they are connected by a line. These poset diagrams (or Hasse diagrams) are
useful in visualizing the order induced on a set by an ordering relation.
Fig. 1
shows
three posets. The rst is the natural numbers ordered by the usual is less than or
equal to. The second is P
3
the lattice of partitions of three elements. A partition y
includes a partition x, x
py, when every cell of x is contained in a cell of y. The third
poset, denoted 2
3
, is the powerset of the set of three elements P餱a; b; cg, ordered by
set inclusion
. The orderings in
Figs. 1
b and c are called partial orders since some
elements are incomparable with respect to the ordering relation. For example, since
it is neither true that fag
pfbg or that fbgpfag, the elements fag and fbg are
incomparable, written fagjjfbg. In contrast, the ordering in
Fig. 1
a is a total order,
since all pairs of elements are comparable with respect to the ordering relation.
A poset P possesses a greatest element if there exists an element > 2 P, called the
top, where x
p> for all x 2 P. Dually, the least element ?2 P, called the bottom,
exists when ?
px for all x 2 P. For example, the top of P
3
is the partition 123 where
all elements are in the same cell. The bottom of P
3
is the partition 1j2j3 where each
element is in its own cell. The elements that cover the bottom are called atoms. For
example, in 2
3
the atoms are the singleton sets fag, fbg, and fcg.
Given a pair of elements x and y, their upper bound is dened as the set of all z 2 P
such that x
pz and ypz. In the event that a unique least upper bound exists, it is
called the join, written x _ y. Dually, we can dene the lower bound and the greatest
lower bound, which if it exists, is called the meet, x ^ y. Graphically the join of two
elements can be found by following the lines upward until they rst converge on a
single element. The meet can be found by following the lines downward. In the lattice
of subsets of the powerset 2
3
, the join _, corresponds to the set union [, and the meet
ARTICLE IN PRESS
Fig. 1. Diagrams of posets described in the text. A. Natural numbers ordered by less than or equal to, B.
P
3
the lattice of partitions of three elements ordered by is contained by, C. 2
3
the lattice of all subsets of
three elements fa; b; cg ordered by is a subset of .
K.H. Knuth / Neurocomputing 67 (2005) 245274
247
^
corresponds to the set intersection \. Elements that cannot be expressed as a join
of two elements are called join-irreducible elements. In the lattice 2
3
, these elements
are the atoms.
Last, the dual of a poset P, written P
q
can be formed by reversing the ordering
relation, which can be visualized by ipping the poset diagram upside-down. This
action exchanges joins and meets and is the reason that their relations come in pairs,
as we will see below. There are different notions of duality and the notion after which
this paper is titled will be discussed later.
2.2. Lattices
A lattice L is a poset where the join and meet exist for every pair of elements. We
can view the lattice as a set of objects ordered by an ordering relation, with the join _
and meet ^ describing the hierarchical structure of the lattice. This is a structural
viewpoint. However, we can also view the lattice from an operational viewpoint as
an algebra on the space of elements. The algebra is dened by the operations _ and
^
along with any other relations induced by the structure of the lattice. Dually, the
operations of the algebra uniquely determine the ordering relation, and hence the
lattice structure. Viewed as operations, the join and meet obey the following
properties for all x; y; z 2 L:
L1:
x _ x x;
x ^ x x 餓dempotency,
L2:
x _ y y _ x;
x ^ y y ^ x 餋ommutativity,
L3:
x _ 饄 _ z 饃 _ y _ z;
x ^ 饄 ^ z 饃 ^ y ^ z 餉ssociativity,
L4:
x _ 饃 ^ y x ^ 饃 _ y x 餉bsorption.
The fact that lattices are algebras can be seen by considering the consistency
relations, which express the relationship between the ordering relation and the join
and meet operations.
x
py
3 x ^ y x
x _ y y 餋onsistency Relations.
Lattices that obey the distributivity relation
D1:
x ^ 饄 _ z 饃 ^ y _ 饃 ^ z 餌istributivity of ^ over _,
and its dual
D2:
x _ 饄 ^ z 饃 _ y ^ 饃 _ z 餌istributivity of _ over ^
are called distributive lattices. All distributive lattices can be expressed