# Computational Complexity ### Friday, December 13, 2002

Learning via Occam's Razor

I have been asked to expand upon the spam example from yesterday's Foundations of Complexity lesson. To do this let me go into a little background on computational learning theory.

The basic model for learning theory has examples given as strings and labelled positive or negative. From these labelled examples, one comes up with a hypothesis that hopefully will classify future examples well.

In PAC (Probably Approximately Correct) learning you want an algorithm that takes labelled examples generated by some distribution and will output some hypothesis that will high confidence will usually classify future examples drawn from the same distribution.

One simple approach is the Occam algorithm based on the Occam's Razor principle "when you have two competing theories which make exactly the same predictions, the one that is simpler is the better."

The Occam algorithm works by finding the smallest representation of a hypothesis consistent with the labelled examples and using that hypothesis to predict future examples. There is a theorem in learning theory that says this algorithm works well with the number of samples roughly the size of the smallest representation.

Let us focus on the problem of identifying spam using general circuits, a collection of AND, OR and NOT gates that capture computation. We'll talk more about circuits in a future lesson but just think of the running time of an algorithm as roughly the size of the equivalent circuit.

Given a collection of emails each labelled either as SPAM or NOT SPAM, the Occam algorithm requires us to find the smallest circuit that correctly labels all of these emails. In general finding such a circuit could be hard but under the assumption that P=NP this is an easy task. By the result mentioned above this circuit will likely classify SPAM correctly most of the time in the future.

• Why should there be a small circuit that characterizes spam? Well, I can look at an email and determine whether it is spam and my brain is just not that complex.
• The theorem only holds if the distribution we learn on is the same as the distribution we apply our circuit to. Spammers might change their spam to fool our new circuit. The P=NP assumption will make it easier for them as well.
• The P=NP assumption is probably not true.
For more information and details on computational learning theory check out the web site learningtheory.org and the resources mentioned on that site.

### Thursday, December 12, 2002

Foundations of Complexity
Lesson 10: The P versus NP Problem

Previous Lesson | Next Lesson

In Lesson 8 we looked at the class P, the set of efficiently computable languages. In Lesson 9 we studied the class NP, the set of languages with efficiently verifiable proofs. Does P=NP, i.e., is every language with efficiently verifiable proofs computable efficiently?

The P versus NP question is the most important question in all of theoretical computer science and in mathematics in general. The Clay Mathematics Institute lists the P versus NP question as one of their seven Millennium Prize Problems. Determine whether P=NP and collect a million dollars.

To understand the importance of the P versus NP, let us imagine a world where P = NP.

• A large class of interesting search problems in NP, thought to be hard to solve, would have efficient solutions. These include Factoring, Map Coloring, Traveling Salesman, Job Scheduling and thousands of others. Half of Garey and Johnson is just a listing of NP-complete problems.
• Public key cryptography would be impossible.
• Learning via Occam's razor would be easy. For example if you wanted an algorithm for separating spam from useful email, just search for the smallest circuit that correctly identifies a large set of samples. You can do this if P=NP.
This is just the beginning. Of course the general consensus is that P≠NP but we have no idea on how to prove it.

Many of the future lessons will deal directly and indirectly with the P versus NP problem. With these lessons, I hope to give you a real feel of the importance and difficultly of this most important question.

### Wednesday, December 11, 2002

FYI: The AIP Bulletin of Science Policy News

The American Institute of Physics produces a free electronic newsletter, FYI, covering science policy in the US. Although it has a physics bent, FYI does an amazing job educating the scientific community on how US science policy is set and what the current issues are. For example, check out this bulletin on the new NSF authorization bill.

Anyone interested or involved in science funding should subscribe to this newsletter. There is also a FYI This Month that gives a single monthly email highlighting the important FYI bulletins.

### Tuesday, December 10, 2002

Today's New York Times carries a lengthy article about how Yahoo is using Manuel Blum's CAPTCHA project to prevent automated registrations. You saw it here first.

### Monday, December 09, 2002

SIGACT News

The December SIGACT News is out, the first edited by David Haglin. Of interest to complexity theorists
• Lane Hemaspaandra's Complexity Theory Column has Part 2 of the Schaefer-Umans survey on Completeness in the Polynomial-Time Hierarchy.
• A book review of A New Kind of Science by Stephen Wolfram. "Its not new, its not science and its not kind."
• The Education Forum discusses a proposal for a Dagstuhl-like facility in the US and a preview of the Homer-Selman book Computability and Complexity Theory.
• Calls for nominations for the Gödel Prize and Knuth Prize. Nominate your favorite complexity papers and theorists.

Complexity Class of the Week: MA

Previous CCW

MA gets its name from the Arthur-Merlin games developed by Babai. MA is an interactive proof system where the all-powerful Merlin sends a message that is verified by a probabilistic Arthur. Here is a formal definition.

A language L is in MA if there is a probabilistic polynomial-time machine M and a polynomial p such that for all strings x,

1. If x is in L then there is a w, |w|=p(|x|) and M(x,w) accepts with probability at least 2/3.
2. If x is not in L then for all w, |w|=p(|x|), M(x,w) accepts with probability at most 1/3.
As with many other interactive proof classes, the 1/3 can be replaced with a value exponentially small in |x| and the 2/3 can be replaced by 1.

The w is a proof that Merlin can write down and Arthur can verify years later without further interaction from Merlin. Sometimes MA is called the class of languages with publishable proofs.

Despite the naturalness of the class, there are no known natural problems in MA not known to be in NP∪BPP.

MA contains NP and BPP and also NPBPP. MA is contained in its cousin class AM and thus BPPNP. MA is also contained in many of the same classes BPP is contained in, including S2, ZPPNP, Σ2∩Π2 and PP. There are oracles where AM is not contained in these classes. Also none of these containments are tight in all relativized worlds.

If L is checkable and has polynomial-size circuit then L is in MA. I will leave the definition of checkable to another post but this implies

If EXP is in P/poly then EXP is in MA.
We can replace EXP in the above statement with PSPACE or PP. Using the above statement one can show that MAEXP does not have polynomial-size circuits. MAEXP is like MA with polynomials replaced with exponentials.

A strong pseudorandom generator that derandomizes probabilistic circuits will also derandomize MA to NP. Using the result of Impagliazzo-Wigderson we get: If E require 2ε n size circuits for some ε>0 then MA = NP.

The quantum version of MA, QMA has gotten some recent attention. Here Merlin sends entangled quantum bits and Arthur does quantum transformations and measurements. We will discuss QMA another day when it has its turn as complexity class of the week.