Friday, December 13, 2002
Learning via Occam's Razor
I have been asked to expand upon the spam example from yesterday's
Foundations of Complexity lesson. To do this let me go into a
little background on computational learning theory.
The basic model for learning theory has examples given as strings and
labelled positive or negative. From these labelled examples, one comes
up with a hypothesis that hopefully will classify future examples
In PAC (Probably Approximately Correct) learning you want an
algorithm that takes labelled examples generated by some distribution
and will output some hypothesis that will high confidence will
usually classify future examples drawn from the same distribution.
One simple approach is the Occam algorithm based on the Occam's
Razor principle "when you have two competing theories which make
exactly the same predictions, the one that is simpler is the better."
The Occam algorithm works by finding the smallest representation of a
hypothesis consistent with the labelled examples and using that
hypothesis to predict future examples. There is a theorem in learning
theory that says this algorithm works well with the number of samples
roughly the size of the smallest representation.
Let us focus on the problem of identifying spam using general circuits,
a collection of AND, OR and NOT gates that capture computation. We'll
talk more about circuits in a future lesson but just think of the
running time of an algorithm as roughly the size of the equivalent
Given a collection of emails each labelled either as SPAM or NOT SPAM,
the Occam algorithm requires us to find the smallest circuit that
correctly labels all of these emails. In general finding such a
circuit could be hard but under the assumption that P=NP this is an
easy task. By the result mentioned above this circuit will likely
classify SPAM correctly most of the time in the future.
Some caveats about this method.
For more information and details on computational learning theory check out the
web site learningtheory.org
and the resources mentioned on that site.
- Why should there be a small circuit that characterizes spam? Well,
I can look at an email and determine whether it is spam and my brain
is just not that complex.
- The theorem only holds if the distribution we learn on is the same
as the distribution we apply our circuit to. Spammers might change
their spam to fool our new circuit. The P=NP assumption will make it
easier for them as well.
- The P=NP assumption is probably not true.
Thursday, December 12, 2002
Foundations of Complexity
Lesson 10: The P versus NP Problem
Previous Lesson |
8 we looked at the class P, the set of efficiently computable
languages. In Lesson
9 we studied the class NP, the set of languages with efficiently
Does P=NP, i.e., is every language with efficiently verifiable
proofs computable efficiently?
The P versus NP question is the most important question in
all of theoretical computer science and in mathematics in general. The
Clay Mathematics Institute lists the P versus NP question as one of
their seven Millennium
Prize Problems. Determine whether P=NP and collect a million
To understand the importance of the P versus NP, let us imagine a
world where P = NP.
This is just the beginning. Of course the general consensus is that
P≠NP but we have no idea on how to prove it.
- A large class of interesting search problems in NP, thought to be
hard to solve, would have efficient solutions. These include
Factoring, Map Coloring, Traveling Salesman, Job
Scheduling and thousands of others. Half of Garey
and Johnson is just a listing of NP-complete problems.
- Public key cryptography would be impossible.
- Learning via Occam's razor would be easy. For example if you
wanted an algorithm for separating spam from useful email, just search
for the smallest circuit that correctly identifies a large set of
samples. You can do this if P=NP.
Many of the future lessons will deal directly and indirectly with the
P versus NP problem. With these lessons, I hope to give you a real
feel of the importance and difficultly of this most important
Wednesday, December 11, 2002
FYI: The AIP Bulletin of Science Policy News
The American Institute of Physics produces a free electronic newsletter, FYI, covering science policy in the US.
Although it has a physics bent, FYI does an amazing job educating the scientific community on how US science policy is set and what the current issues are. For example, check out this bulletin on the new NSF authorization bill.
Anyone interested or involved in science funding should subscribe to this newsletter. There is also a
FYI This Month that gives a single monthly email highlighting the important FYI bulletins.
Tuesday, December 10, 2002
CAPTCHA in the Times
Today's New York Times carries a lengthy article about how Yahoo is using Manuel Blum's CAPTCHA project to prevent automated registrations. You
saw it here first.
Monday, December 09, 2002
The December SIGACT News is out, the first edited by David Haglin. Of interest to complexity theorists
- Lane Hemaspaandra's Complexity Theory Column has Part 2 of the Schaefer-Umans survey on Completeness in the Polynomial-Time Hierarchy.
- A book review of A New Kind of Science by Stephen Wolfram. "Its not new, its not science and its not kind."
- The Education Forum discusses a proposal for a Dagstuhl-like facility in the US and a preview of the Homer-Selman book Computability and Complexity Theory.
- Calls for nominations for the Gödel Prize and Knuth Prize. Nominate your
favorite complexity papers and theorists.
Complexity Class of the Week: MA
MA gets its name from the Arthur-Merlin games developed by Babai. MA
is an interactive proof system where the all-powerful Merlin sends a
message that is verified by a probabilistic Arthur. Here is a formal
A language L is in MA if there is a probabilistic polynomial-time
machine M and a polynomial p such that for all strings x,
As with many other interactive proof classes, the 1/3 can be replaced with a
value exponentially small in |x| and the 2/3 can be replaced by 1.
- If x is in L then there is a w, |w|=p(|x|) and M(x,w) accepts with
probability at least 2/3.
- If x is not in L then for all w, |w|=p(|x|), M(x,w) accepts with
probability at most 1/3.
The w is a proof that Merlin can write down and Arthur can verify
years later without further interaction from Merlin. Sometimes MA
is called the class of languages with publishable proofs.
Despite the naturalness of the class, there are no known natural
problems in MA not known to be in NP∪BPP.
MA contains NP and BPP and also NPBPP. MA is
contained in its cousin class AM and thus BPPNP. MA is also
contained in many of the same classes BPP is contained in, including
Σ2∩Π2 and PP. There are oracles
where AM is not contained in these classes. Also none of these
containments are tight in all relativized worlds.
If L is checkable and has polynomial-size circuit then L is in
MA. I will leave the definition of checkable to another post but this
If EXP is in P/poly then EXP is in MA.
We can replace EXP in the above statement with PSPACE or PP. Using the
above statement one can show that MAEXP does not have polynomial-size circuits.
MAEXP is like MA with polynomials replaced with exponentials.
A strong pseudorandom generator that derandomizes probabilistic
circuits will also derandomize MA to NP. Using the result of
Impagliazzo-Wigderson we get: If E require 2ε n
size circuits for some ε>0 then MA = NP.
The quantum version of MA, QMA has gotten some recent attention. Here
Merlin sends entangled quantum bits and Arthur does quantum
transformations and measurements. We will discuss QMA another day when
it has its turn as complexity class of the week.