Tuesday, December 24, 2013

My 2013 Post

The end of 2013 is approaching, and I realized I hadn't posted a single time this year. Starting a tenure-track job and having a baby apparently leave little time for blogging!  However, I do want to keep my blog somewhat active, so I figured I better make at least one post for this year.

But what to write about? It's hard to choose just one topic to make a "post of the year".  So, I'll just write about some work-related highlights of 2013, big and small, in no particular order:
  • The newly founded Simons Institute for the Theory of Computing seems to have had a great year. I haven't had a chance to visit, but I have watched some videos of their excellent talks online! And I've been keeping up with Moritz Hardt's interesting new blog.
  • I've for the first time gotten involved in running a conference, or rather part of it: ISAIM 2014 will feature a session on the "Theory of Machine Learning," and I'm quite happy with the program. (It's not too late to attend!)
  • My friend Nikhil Srivastava (who appeared on this blog before), together with Adam Marcus and Dan Spielman, proved the Kadison-Singer conjecture.  It's a result that touches on many areas of mathematics -- its resolution a big deal, and especially exciting it came from the computer science community.
  • With the opening of SCDA, Facebook Research Labs, MSR-NYC, and others, machine learning seems to be quite in demand, both in academia and industry. (And, not quite unrelatedly, deep learning is continuing to make a comeback.) It's a great time to be a machine learning researcher or practitioner.
  • Near the beginning of the year, my co-authors and I finished and published a paper extending statistical queries to optimization algorithms and proved a lower bound for planted clique. I'm particularly happy about this result, which had been in the works for a while. 
  • I published my first paper with students at UIC. One of them, Jeremy Kun, who is working with me, has a great blog, which you should all read!  This also happens to be my first work primarily in game theory, which seems to be an area many learning researchers delve into. 
  • While last year was 100 years since Alan Turing's birth, this year was Erdős's centenary.  One could say that Erdős was the Turing of combinatorics, and he had quite a bit of impact in computer science as well.
  • It looks like the NSA hasn't made any huge mathematical breakthroughs, but that hasn't kept them from breaking crypto-systems in the real world, both in ways we'd expect and in ways we wouldn't.
  • Leslie Valiant published Probably Approximately Correct. I'm sold, but can it bring learning theory to the masses?
I'm sure I missed some important things, but that's all I have for now.  I resolve to post more next year. It's a modest, but achievable, goal.  To a happy and productive 2014!

Friday, December 28, 2012

Turing's Revolution

In my post Alan Turing Year, I discussed how it's important for people to be aware of Turing's intellectual contributions to math and computer science, as well as his war efforts and tragic death.  It's time to live up to my own call and do my part.  In this post I'll talk about Turing's impact, from his best known contributions to his lesser known, but important ones.

I should note that the content of this post is largely stolen from answers to a question I asked on the StackExchange (SE) cstheory site, where I sought to gather a compendium of Turing's less known results.  The answers made a collection of impressive contributions, some of which I wasn't familiar with.  Some of his contributions turned out to be surprisingly modern.  They also introduced us to viewing the world through an algorithmic lens.

Without further ado, here they are:
1. Defining Turing Machines and Showing the Existence of UTMs (1936): A Turing machine is a universal computing device that forms the basis of how we think about computing.  A Universal Turing machine is a fixed machine that can simulate any other Turing machine -- most relevant to our current computers. (SE link)
a Lego Turing Machine, photo from wikipedia 
2. The Church-Turing Thesis (1939): The idea that the Turing Machine (and Lambda calculus) captures all of computation, and is therefore worth studying.  This hypothesis cannot be proven, but everyone believes it nonetheless. (wiki link
3. Solving the Halting problem (1936): A version of this was posed by Hilbert.  Turing showed that telling whether a program will Halt is undecidable.  (wiki link
4. The Turing Test and Introducing AI (1950): Turing gave an objective test that decides whether a computer exhibits intelligence.  The idea is to say that a computer is intelligent if it can fool humans into thinking it is also human.  Accepting this idea is a way to avoid many philosophical arguments about the nature of intelligence. (wiki link
5. Introducing Oracles and Relativization into Computability Theory (1939): Oracles are incredibly useful for all sorts of analysis -- for example the famous Baker-Gill-Solovay result that says P vs NP cannot be separated any technique that relativizes (holds for relative to any oracle), eg diagnolization.  It was part of the paper "Computing machinery and intelligence" that launched AI. (SE link
6. The Bombe (1950): Also called the Turing-Welchman Bombe, for breaking the German enigma machine in WW2 and introducing tools to cryptanalysis. (SE link)  
 the Bombe, from http://www.uh.edu/engines/epi2666.htm
7. Randomized Algorithms (1950): Turing observed the potential advantage an algorithm can gain from having access to randomness. (SE link)
8. The First Artificial Neural Network (1948): This idea is an important one in AI and machine learning.  Turing's design predated Rosenblatt's preceptron. (SE link
9. The Chemical Basis of Morphogenisis (1952): In his most cited paper, he began to tackle the question of how a symmetric embryo develops into an assymetric organism using symmetry perserving chemical reactions.  This started to introduce algorithmic thinking into biology. (SE link
10. Good-Turing Estimator (1953): For estimating an unseen fraction on a population when given past data. (SE link
11. One of the First Chess Algorithms (1952): Before we had computers, we had paper chess algorithms. (SE link
  Deep Blue, photo form wikipedia
12. Checking a Large Routine (1949): The idea of using invariants to prove properties of programs and "well-foundedness" to prove termination. (SE link
13. LU-Decomposition (1947): A method for factorizing a matrix as the product of a lower triangular matrix and an upper triangular matrix.  Now commonly studied in Linear Algebra. (SE link)
14. Turing's Method for Computational Analysis of a Function (1943): Invented for studying the of the Riemann zeta-function but is more widely applied. (SE link)

I'm sure there are more.  Any one of these is an impressive achievement for a scientist -- the first couple are revolutionary.  That Turing did all of these and more shows his remarkable contributions to our field.

Update (12/31/12): Alan Richmond points out another important work of Turing's, which I added as # 14.  Also, my wording for the Turing test proved a bit controversial, so I've changed it to better reflect reality.

Friday, December 07, 2012

UIC's MCS Program Seeking Graduate Students

As I'd posted before, UIC has two departments interested in computer science.  We have a computer science department in the engineering school.  And the math department in the school of liberal arts and sciences, where I have an appointment, has a smaller program in "mathematical computer science" (MCS).

The MCS program is more focused on the mathematical aspects of computing and less on systems, databases, etc.  It is similar to the ACO programs at CMU and Georgia Tech.

We recently updated our website, where you can learn more about the program and the affiliated faculty.  More importantly, if you're a student looking to get a Ph.D. in a mathematical area of computing, consider applying to our program -- we're looking for strong applicants.

a screenshot of the new MCS website

Tuesday, August 07, 2012

Diamond Forever

Recently, the academic community started to fight back against traditional academic publishers, who earn large profits without any longer providing many clear services.  The internet has been a great tool for taking care of such inefficiencies, and it's quickly disrupting the publishing market as well.

Having thought a bit about these issues myself, I wanted to summarize my views of some of the various publishing models (as described by Tim Gowers) under consideration, as well as the pros and cons of each.  It's possible, even likely, that I am missing something, so feel free to point it out in the comments.

The Systems


screenshot, after I tried to access a paper from nature.com

system: Researchers send manuscripts to publishers. Publishers have volunteer editorial boards consisting of researchers who decide whether to accept/reject manuscripts. Accepted manuscripts are published but cannot be accessed without a paid subscription to the given journal. Authors lose most (or all) rights to their own work.

who wins: publishers, by charging for access

who loses: university libraries and therefore scientists who pay for it with overhead; the public, who doesn't get access to the research their taxes likely funded; authors, who cannot easily have their research read and used.

opinion: This system made some sense before the internet era, where there were few other good ways to disseminate research. The publishers used to have to work hard for their profits by doing serious formatting/printing/publishing work. With the internet, all of this is now done by the authors, and the closed-access system no longer makes sense.


system: Same as closed-access, except that authors are allowed to put their accepted manuscripts on their websites and onto online repositories such as the arXiv.

who wins: publishers, by charging for access; researchers who put their papers online

who loses: basically same as closed-access, except for the authors who put their papers online

opinion: This is the system most journals are now on, or at least they usually don't go after scientists who keep their papers online. This is strictly better than closed-access, but doesn't go far enough. Too many scientists do not put their papers on their websites or arXiv. There is still no guarantee that science will be public, as it should be.


a screenshot of the new gold-access math journal

system: Researchers send manuscripts to publishers. Publishers have volunteer editorial boards consisting of researchers who decide whether to accept/reject manuscripts. Authors of accepted manuscripts pay to have their work published but published manuscripts are freely accessible to everyone online.

who wins: the public, who can access research for free. the publishers, who still get paid (but not as much)

who loses: the authors, who have to pay (usually large sums) to have their work published. universities, which might need to subsidize the fees. quality (see below).

opinion: I think adopting this system would be a mistake. First, scientists are the ones doing the work, the editing, and the main contributions -- it seems unfair to make them pay as well. Researchers without grants or university support would have to use their personal funds. Moreover, I think this would seriously impact paper quality.  In this system, journals would make money by accepting more papers as opposed to getting more readers, and some authors may have incentives to pad their CVs. I cannot see how this wouldn't go wrong.


the journal of machine learning research, image from their website

system: Researchers send manuscripts to publishers. Publishers have volunteer editorial boards consisting of researchers who decide whether to accept/reject manuscripts. Authors of accepted manuscripts have their work published for free and published manuscripts are freely accessible to everyone online.

who wins: scientists, who can publish for free and have their papers read. the public, who can access research freely. universities, who do not have to pay for journal access. research quality.

who loses: publishers, who cannot any longer charge for anything

opinion: This is the best of all worlds. Everyone seems to win, and research quality especially wins because nobody makes a profit from publishing bad results.  The only problem is that the publishers lose, and why would anyone run a journal without making a profit?

A Solution

Diamond-access is the best system for everyone, except for the publishers.  However, publishers need incentives to run a journal, which they wouldn't have in a diamond access system.  Now, becoming a publisher would involve operating at a loss.  So who should take the hit?

One obvious solution is for universities to publish (or pay for publishing) journals.  They would make a sacrifice in maintaining the journals for free, but what would they win? Quite a lot actually.  Were we do switch to this system, university libraries would no longer have to pay subscription fees to journals, and the university professors wouldn't have to pay to have their work published.  And the quality of publications wouldn't suffer.   Some of this cost could even be handled by overhead.

Yale's library (from wiki), where I not once went to do research during my Ph.D.

But what is the cost of running a diamond access publication?  Well, it turns out to be very small.  With pdfs online, there's really no need to publish paper proceedings.  I don't think I've ever gone to a library to find an article on paper -- pretty much nobody does these days.  So, all you need to do is register a domain and pay for some maintenance and hosting.  The rest of the work is done by volunteers.  Looking at the financial statement (of which I learned from Google+) of the Journal of Computational Geometry, it could take as little as $10 per year.  Heck, we don't really need universities to pay for it.

Yes, it really costs very little to run a journal these days if you do it right.  There's no need for readers, or authors, to pay huge sums to middle-men.

Monday, May 07, 2012

Atlanta, Northward

When you accept a postdoctoral position, you know you probably can't stay indefinitely.  Psychologically, it means you have to prepare yourself for having to switch jobs, coworkers, offices, etc., and I, as a postdoc, have been ready for this foreseeable consequence with Georgia Tech, where I've enjoyed being part of a strong and lively department the last two years.

Unfortunately, when you take a postdoc in a city like Atlanta, where there isn't as high a concentration of universities and research labs some other places, you not only have to leave your job, but likely also the city.  And even though it was in the back of my mind that I'd probably have to move, I didn't expect to love Atlanta as much as I have come to.
Atlanta's botanical garden

Atlanta has sunny, warm weather, really friendly people, lots of trees, amazing restaurants, nice parks, museums, a symphony, theaters, and everything you could want in a city (except, perhaps, a waterfront).  It's also very livable -- the population density is fairly low, so you're not elbow-to-elbow every time you leave your house, and you can live in really nice suburbs right in the middle of the city.  Because of the large influx of new residents, it's mostly new and clean, yet the cost of living is quite low.  And you can reach great hiking with a 20 minute drive and reach a beautiful national park in 3 hours.  It's certainly a place I wouldn't mind living in again at some point in my life.
Smoky Mountains National Park

But while I'm sad to have to leave Atlanta, but I am also excited to begin a new phase in my career.  In a couple months, I'll be joining the University of Illinois at Chicago (UIC) as an Assistant Professor in the Math department, actually called "Mathematics, Statistics, and Computer Science."  The department is quite strong and well-regarded, but unlike most math departments, UIC's is expressly concerned with Computer Science (so, yes, I'm remaining a computer scientist). Additionally, UIC has another entire standard Computer Science department, where I'll also have a courtesy appointment.  Between the two departments, there are quite a few great faculty members working on various aspects of machine learning -- I'm looking forward to joining them. And with Northwestern, UChicago, and TTI-C nearby, Chicago should be quite an exciting place to work.
Chicago, reflected in "the bean"

But because this post started with Atlanta, I should say something about the city of Chicago.  While I don't yet know Chicago well, it also looks like a beautiful, livable place, on the shores of a (literally) great lake.  The people seem friendly, and there is a lot to do.  I just hope that, after being spoiled by Atlanta, I can get used to Chicago's cold winters.

Friday, April 27, 2012

Computing Conceptions

From Georgia Tech, many of us have been closely and concernedly watching our southern neighbors at the University of Florida, where budget cuts and a seemingly hostile dean conspired in an attempt to decimate their well-respected computer science department.  Dismantling the department, at a time when computer science research is so critical, would be a laughably bad decision on the part of the university.

I don't claim to fully understand the rationale behind this plan.  However, I have a feeling that part of the reason such an idea was even being seriously considered has to do with a couple misconceptions of what computer science research is, and I fear that such misconceptions extend beyond the state of Florida.  And I don't just mean that the average person doesn't understand computer science; that's to be expected. I mean that many academics, even scientists, don't understand the basics of what computer science is about and therefore tend to devalue it, especially as an academic discipline.

First, many people seem to assume that computer science is just programming or fixing computers.  As a graduate student at Yale, where computer science is a small department, I was often asked by other Ph.D. students why computer science even has a Ph.D. program.  They didn't view it as an academic pursuit, but more as a trade skill.  I fear that many scientists view computer science as limited to programming or getting computers to work, probably because that's the way most, say, physicists use computers.  They have little understanding of the beautiful, deep results and insights that computer science has brought the world.  Viewing it as an instrumental non-academic field, people think it would be okay to kill-off computer science research and leave the professors to teach programming (which, admittedly, is an important part of a computer science department's job).

(clip art from here)
something computer scientists do not normally do

The other, very related, misconception, one that was clearly in play at the University of Florida, is that the computer science department was somewhat redundant because the electrical and computer engineering department already has the word "computer" in it.  Their reasoning sounded more sophisticated than that, but only superficially.  But computer science and electrical engineering are very far in their central concerns.  Computer science, for the most part, is divorced from concerns about electricity, physical media, or anything of that sort.  Whether you work on operating systems, machine learning, or the theory of computation, you mostly don't really care about the underlying hardware, whereas electrical engineers do.  Greg Kuperberg, writing on Scott Aaronson's great blog post on this issue, puts it better than I could:
"Apparently from (Florida engineering dean) Abernathy’s Stanford interview, and from her actions, she simply takes computer science to be a special case of electrical engineering. Ultimately, it’s a rejection of the fundamental concept of Turing universality. In this world view, there is no such thing as an abstract computer, or at best who really cares if there is one; all that really exists is electronic devices.
[...] Yes, in practice modern computers are electronic. However, if someone does research in compilers, much less CS theory, then really nothing at all is said about electricity. To most people in computer science, it’s completely peripheral that computers are electronic. Nor is this just a matter of theoretical vs applied computer science. CS theory may be theoretical, but compiler research isn’t, much less other topics such as user interfaces or digital libraries. Abernathy herself works in materials engineering and has a PhD from Stanford. I’m left wondering at what point she failed to understand, or began to misunderstand or dismiss, the abstract concept of a computer."
(image from here)
something usually not of research interest to computer scientists 

It looks like a disaster in Florida has so far been avoided. And with each passing year, more scientists will have, at the very least, taken some basic computer science in college -- it is part of our job to teach the important concepts in our introductory courses.  I'm hoping this will improve the perceptions of our field.  But in the meanwhile, it has become apparent that we have much more PR to do to.

(image from here)
now we're talking!

Sunday, January 22, 2012

Alan Turing Year

I am excited that 2012, the centenary of Alan Turing’s birth, will see many celebrations of his life and legacy.  It is hard to think of a scientist who has had more impact both on science and on our lives, but who still remains as unknown to the public, so I am glad Turing will be getting some much-deserved attention this year.

Alan Turing, photo copyright held by National Portrait Gallery in London

Alan Turing laid out the foundations of the study of computer science (in the process of solving Hilbert’s Entscheidungsproblem – no small feat on its own), contributed fundamentally to our understanding computation in nature, envisioned the future of intelligent machines, and saved millions of lives by helping shorten the length of World War 2.

Yet upon discovering Turing was gay (and convicting him for "indecency"), the British government took away his job, kept many of his contributions secret, and chemically castrated him, driving him to suicide.

After some long-overdue outrage, Gordon Brown issued an apology on Britain's behalf, albeit decades too late.  Now there are new petitions, asking for the British Government to grant Turing a pardon.  I have nothing against these efforts, but how Turing was treated is to our collective shame, and we should remember that these efforts are to make us, not him, feel better.  Turing knew he wasn't doing anything wrong and never needed condescending “pardoning” from politicians, nor does he need them any more after his death.  If the pardon is to send an apology to the homosexual community for their mistreatment, I’m sure the British government can think of something a little more direct.

I think a better way of honoring Turing is to make sure people know about his work – like we do of KeplerGalileoNewtonDarwinEinstein, and many of the other great scientists who revolutionized our world-views. Unfortunately there’s a lot of work to be done: even well-meaning articles trying to popularize Turing’s work mainly describe him as an accomplished code-breaker and World War 2 hero.  Turing did break codes, most notably the German Enigma machine, but calling Turing a code-breaker without mentioning his scientific impact is akin to calling Isaac Newton a treasurer, given his years as England's Master of the Mint.

So, this year, we academics will celebrate Turing’s great accomplishments by holding conferencesawards, and lectures in his honor.  There will be documentaries released about his life.  I’m sure an op-ed or two will be written in the newspapers.  But I also hope that reporters, lecturers, and especially teachers, will help the the public at large learn about this pioneer of the information age.

Thursday, October 27, 2011

ALT 2011

Earlier this month, I traveled to Finland, where I attended ALT 2011 and presented a paper on learning sparse parity functions in the presence of noise. 

a picture I took in Helsinki

COLT (Conference on Learning Theory) and ALT (Algorithmic Learning Theory) are the two main learning theory conferences.  Though COLT is both the larger and better known of the two, it seems that ALT is growing stronger with every passing year, and this year was no exception.  Not only was the program good, but Sébastien Bubeck gave a timely tutorial on bandits, which deservedly got their own session this year. (I, unfortunately, arrived too late to see the tutorial, but I read the nice slides afterwards.)

I wouldn't be the first to note that the other thing that differentiates ALT from its big brother is that ALT seems to have a broader definition of learning theory, and the conference is therefore more diverse in its topics than a typical COLT.  This has clear benefits, but it also means that there are entire sessions where I am pretty lost.  Of course, as I've previously noted, not feeling like you have to following everything could also be a benefit when you're attending a conference.

I want to mention one paper I particularly enjoyed, especially because of the model, which illustrates the importance of unlabeled data in the agnostic setting:
"Learning a Classifier when the Labeling is Known" by Shai Ben-David and Shalev Ben-David.  This paper works in an agnostic learning model where the learner already knows the labeling function, but wants to find a good succinct representation (i.e. proper learning).  How good a particular representation is depends on the distribution the data is coming from, and this is what the algorithm needs to learn by seeing examples.  It turns out that this, more generous, model is subject to essentially the same sample complexity lower bound as the common agnostic PAC model.

Monday, August 22, 2011

Happenings on CSTheory Q&A

In addition to machine learning, one of my main research interests is theoretical computer science.  Following the success of MathOverflow and other Q&A sites, cstheory.stackexchange.com ("cstheory" for short) was born about a year ago, and I've been an active participant since its inception.  In the last year, we've worked hard to make cstheory a valuable resource and to promote it.  I'd say the efforts paid off -- the site is doing quite well, with over 5000 members.

Moreover, cstheory recently added a blog (Aaron Sterling and Joe Fitzsimons are editors), whose posts are on topics related to theoretical computer science and to the site itself.  In one of the first posts, Suresh Venkatasubramanian, a site moderator and one of the people most responsible for its success, wrote a nice summary of cstheory's first year.

I've also volunteered to occasionally blog about the learning theory side of things.  My first entry, on learning regular languages, was just posted.  It touches on many interesting issues learning theorists regularly encounter. Enjoy!

Saturday, August 13, 2011

Reyzin-Srivastava Trees

When I was a grad student at Yale, Ravi Kannan taught a great sequence of graduate theory courses on approximation algorithms, randomized algorithms, and streaming algorithms, and I was lucky enough to get to take all three.  The first one, approximation algorithms, Ravi taught during my first semester in grad school.  We were allowed to collaborate on assignments, and I discussed many of them with my friend Nikhil Srivastava.

I remember a problem on one of the first assignments was particularly tricky.  The question was about s-t cuts in a graph.  The hard part about this problem was that to solve it, it seemed we needed to efficiently keep track of all pairs minimum s-t cuts.  We started discussing the problem in the afternoon, and though it took us a while, by about 5am we had designed a data structure to do just that.  We realized you can maintain all the cuts in just one tree.  It was perfect.  We jokingly called the data structure the Reyzin-Srivastava tree.

I spent the remainder of the night writing up the assignment, describing this thing in detail, proud of our cleverness.  About a week later, we got the graded assignments back.  Mine had one note on it, next to that particular question of course.  It said "Next time, just reference Gomory-Hu trees when you need to use them."

I learned a couple lessons from this.  The first is that I should remember to do the required reading before starting the assignment (we used Vijay Vazirani's approximation algorithms textbook, which really is worth reading).  Had I done the reading, I would have learned about Gomory-Hu trees and realized the question asked us to trivially apply them to some problem, instead of reinventing them.   The second, and perhaps, more important lesson was that I had a shot at doing research, at inventing something new.  Yes, these trees had been invented decades before, and yes, we'd been exposed to more data structures than Gomory and Hu had been in 1961.  But somehow knowing that we figured out something on our own that wasn't trivial really helped our confidence as first year grad students, and as I've written before, confidence is quite important for research.

Nikhil and I went on to do some real research together, producing two joint papers the following year.  Actually, both of those papers also began as classwork, for Dana Angluin's machine learning class.  But that's a story better left for another post.

Friday, July 15, 2011

On ICML 2011

I've heard from some people that they liked my summary of NIPS 2010, so I'm posting my thoughts on ICML 2011, where I went a couple weeks ago.  ICML is one of the two main yearly conferences in machine learning, so there are always interesting new results to see, and this year saw a record of over 700 attendees.  Unfortunately, I didn't get to attend the tutorials or the workshops, so my post is about the main conference.  The conference took place in Bellevue, a city near Seattle, Washington.
Bellevue downtown at night, from Wikipedia

Invited Talks

One of the main highlights of ICML 2011 were the invited talks.  Usually, I like one invited talk or so, but this time all of them were great.   I've heard a number of people remark that the organizers did a great job of choosing the speakers.  I should also note that the videos of all the talks, invited and not, will soon appear on TechTalks, and I recommend you watch the invited talks especially.

The first talk was by Chris Bishop, of the Microsoft Research (MSR) Cambridge, UK lab, who talked about applied machine learning coming of age.  He addressed a variety of topics, perhaps a bit loosely tied together, but the real highlight of his talk was the first half-hour, where he addressed the technology behind Kinect.  Apparently, much of the technology behind it was developed at MSR (this should be good for research funding at Microsoft, as Kinect was a huge commercial success), and he went into some depth about both the machine learning and hardware challenges in building Kinect. For example, machine learning was used for the body tracking system in Kinect, but with some interesting twists, e.g. 1) the body tracking problem isn't treated as a tracking problem at all: each individual frame is independently classified, 2) lots of the training data is synthetic.  On the hardware end, I learned that Kinect could track you just fine in the dark.  To find out more, you should watch the entire talk.
Kinect sensor, image from Wikipedia

The second invited talk was by a mathematical biologist, Martin Nowak, who talked about the evolutionary dynamics of competition and cooperation.  He talked about all sorts of interesting things.  What stuck with me the most were some results he presented on the repeated prisoner's dilemma.  In the famous classic prisoner's dilemma, each of two prisoners must independently decide whether to cooperate or defect (see figure below for a possible payoff matrix).  The dominant strategy for both prisoners is to defect, but if both prisoners cooperate they are better off than both defecting, hence the dilemma.  In the real world, we often play variants of this game, but over and over again.  The repeated prisoners dilemma setting captures this aspect of repeated play, and a while ago, Robert Axelrod held a famous repeated prisoners dilemma tournament where he showed tit-for-tat was a really good strategy: basically, to cooperate at first and then, on each future round, to do what your opponent did on the previous one.  I have a lot more to say on this topic, but I'll skip to Martin's talk.  Martin presented some interesting results where he showed us that if noise is introduced into this repeated game (like in real life), and one runs an evolutionary tournament that starts with random strategies, first tit-for-tat will evolve.  But then, tit-for-tat will be taken over by a more forgiving versions tit-for-tat, which will subsequently be dominated by completely cooperative strategies, which will be exploited by completely defecting strategies, which will then be taken over by tit-for-tats again.  And this cycle repeats!  Martin pessimistically suggested this this cycle is inevitable in the real world as well.  Of course, his talk had a lot more, so if you want to know the details of this and more, I recommend you watch it!
possible prisoner's dilemma payoff matrix, image from "beyond intractability"

Then, Hatmut Neven, of Google talked about the machine learning behind "Google Goggles," the reverse-image search where you take a picture of an object, send it to Google as a query, and Google will hopefully return some relevant results.  I hadn't thought much about this reverse-search problem, so a lot of the talk was interesting to me on a basic level.  For example, if you send a picture of a fancy chair, it's useless for Google to tell you it's a chair, you're probably looking to know what type of chair it is so you can buy it -- this makes the problem very hard.  But it doesn't mean Hatmut and his team haven't made a scary amount of progress: Hatmut says that Google can, given a picture of an arbitrary person who has about a dozen photos of them in Google's databases (i.e. online), correctly classify them in the top ten results.  Google hasn't activated this feature for privacy reasons, but if (or when?) Google does, this means that you could pretty much see someone on the street, take a picture of them, and find out who they are (and they don't even have to be a famous actress).  Hatmut thinks that in 10 years, we'll be able to carry a camera with us, which will, without any special querying, know more about the world around us than we do -- it will know what species of trees are around us, the origin of the name of the street we are on, the history of the author of the book we're reading, etc. Exciting stuff.  Finally, Hatmut talked about how quantum computing might help with all this, and he gave a much more positive endorsement of D-Wave than I'd read elsewhere.
D-Wave's 128-bit chip, image from Wikipedia

The last talk was by David Ferrucci about the IBM Watson system that beat the top two humans on Jeopardy (see my post on Watson).  He talked about the immense challenge of building Watson, the collaboration among many different types of researchers, the scientific problems involved, and the various problems his team had to overcome.  The task of building Watson was so monumental that many (non computer scientist) people couldn't get their heads around it.  David described an incident when he went on a radio show to talk about Watson and was asked by the radio host "So, you're going to give the computer all the questions and answers, and when a question is asked, the computer will pull up the right answer?"  David said, surely rolling his eyes, "No. We don't know any of the answers in advance, let alone the questions."  Then the radio show host responded "I don't understand. How are you going to do it then?" "Exactly!" said David.  I trust I'm not spoiling anything if I tell you machine learning was a big part of the solution.
Watson's avatar, image from Wikipedia

Neural Networks 

John Langford (who hosted my postdoc at Yahoo! Research) already blogged that neural networks were making a comeback, so to speak, at ICML 2011.  In addition to the sheer number of "deep learning" papers, lots of people excitedly talked about what great things they could train their neural nets to do.  I would add to John's post, after talking to Bill and Ilya, that another reason this area is becoming "hot" is that we finally have the computing power to train neural networks for real-world problems, something that was previously much more difficult.

I must admit that I don't know enough about neural networks, and the reason is that it's not easy to see how the fit into the larger world from a learning theory perspective.  For example, we know that a single-layer neural network cannot learn even the XOR function, that recurrent neural nets can simulate Turing Machines, and that certain classes of neural nets are not properly PAC learnable.  But from what I can see, the theory of neural nets is of a very different flavor than, say, the theory of decision trees.  If someone who knows these connections better wrote up a survey, I'm sure I'm not the only one who would be grateful. (Even better, maybe such a survey exists and I'm not aware of it.)

Papers I Liked

Given that ICML has 4 or 5 parallel tracks, I didn't get to see even 20% of the talks, but I tried to catch what I could at the poster sessions.  Here are a few non neural-networks (for the neural networks ones, see John's post) papers I liked, besides my own, of course.

"On the Necessity of Irrelevant Variables" by David Helmbold and Phil Long.  The main gist of this paper is that irrelevant variables don't hurt a predictor nearly as much as relevant variables help.  So if you're trying to do feature selection and aren't sure whether to add a feature or not, add it.

"Clustering Partially Observed Graphs via Covex Optimization" by Ali Jalali, Yudong Chen, Sujay Sanghavi, and Huan Xu.  Here, they do correlation clustering with partial observations, which is NP-Hard, so instead of the usual approach of making an approximation algorithm, they give an algorithm which succeeds of the data satisfies certain properties.  They're not the first to do this type of analysis, but I rather liked their type of data-dependent assumption.

"Optimal Distributed Online Prediction" by Ofer Dekel, Ran Gilad-Bachrach, Ohad Shamir, and Lin Xiao.  This paper considers online prediction, but in a distributed model, where one can get speedups, possibly at some cost in the regret.  They derive optimal regret bounds for convex losses and i.i.d. examples, and show that for many parameter settings,  you can actually get the same asymptotic regret as without parallelizing.

"Doubly Robust Policy Evaluation and Learning" by Miroslav Dudik, John Langford, and Lihong Li.  I'm a bit biased about this paper because I was at Yahoo! when they were woking on it. They basically show a technique for evaluating a new bandit policy on historical data, where you can get an unbiased evaluation as long as you have either a good model of rewards or a good model of the past policy. Hence, doubly robust.


ICML 2011 was great.  I'm looking forward to ICML 2012! One extra bonus is that ICML 2012 will be co-located in Edinburgh with COLT 2012, the main learning theory conference.

Update (7/26/11): Videos of all the talks from ICML 2011 have now been posted here.

If you're interested in machine learning, you should follow John Langford's blog, hunch.net.  Also, John will co-chair ICML 2012.

Thursday, May 19, 2011

Three Tweets

It is well-known that it's not always easy to explain scientific ideas to the public.  Scientists are often blamed for being bad communicators, but I don't think that's fair.  Most people simply aren't interested enough to read detailed explanations of science (or of anything for that matter).  And scientists are hesitant to give oversimplified explanations because, among other reasons, oversimplified explanations are by definition not correct.  The problem surely exists in all sorts of disciplines, but is probably exacerbated in the sciences/math, where one often needs years of postgraduate study to truly grasp what's going on.

Sometimes, though, it doesn't hurt to give simplified explanations of complicated phenomena.  Our universe is a cool and interesting place, and making some of what we've learned accessible to more people isn't a bad idea.  Maybe it will make more people excited about science and help with funding in the long run.  It's also fun to try to explain what you're doing to others, even if it's on a high level.

Unfortunately, there's still temptation for scientists to go into too much detail.  We can't help ourselves but to bore everyone around us.  This is where twitter comes to the rescue.  Its 140 character limit forces us to be concise, so if we're going to talk about science at all, we have to choose our words carefully.  So, when Sean Carroll, a physicist at CalTech, entertained a request to explain M-theory on twitter, and attempted to do it using only 3 tweets, he opened a floodgate of other scientists trying to explain the major ideas in their fields in just 3 tweets.  Check out the #3tweets hashtag, and you'll see all sorts of interesting things posted.

Sticking to three tweets strikes a balance between a blog post (which won't get a large readership) and just 1 tweet (in which one cannot explain anything).  And if you have followers who are reading your twitter stream, they won't be able to avoid reading some science.

I, too, got tempted and did my own three tweets on the Church-Turing thesis (to be read bottom-up):

If you have a twitter account (and if you don't, get one), try explaining something about what you do, whether it's science or not, in just three tweets (and don't forget to use the #3tweets hashtag).  It's harder than it seems.

Sean Carroll also blogged about this.

Wednesday, March 30, 2011

Paperless Problems

While listening to an academic talk at a conference, I like to read the corresponding paper being presented.  I follow along by taking notes right in the paper proceedings and marking things that will be of interest to me later.  Then on the flight home, I leisurely browse through the proceedings, looking at my notes.

Apparently, I am also one of only a few people to do this because conferences, almost uniformly, have been getting rid of paper proceedings.  Instead, we get the proceedings on CD or USB stick, or the proceedings are put online, and we are (in theory) supposed to follow along on our laptops or to print the papers interesting to us in advance of the talks.

Unfortunately, this never really worked for me.  My laptop 1) doesn't have enough battery life to last the entire conference* 2) does not allow me to take notes on the paper 3) is too distracting to be used effectively.  These problems are solvable in theory, but it seems in theory only. I could (and sometimes do) fight for the occasional outlet. Technically, there's software that will let me write on pdfs, but it never works properly, and I'm no good at "writing" with my mouse. And I could try to not get distracted on my laptop, but it's a losing battle.  

For all our technical sophistication, we can't yet beat the simplicity of paper.  It's easy to write on and even has great battery life and resolution! But the solution of printing interesting papers in advance is no good because I don't know which papers are interesting to me until I actually go to the talk -- I think of talks as advertisements for papers.  If a talk is good, I'll read the paper afterwards.  If a talk is boring, I probably won't, unless I need to directly for research.

So, desperate for a solution, I got a Kindle so that I could load the papers in advance and read them during the conference.  The latest Kindle even advertised the ability to mark up pdf documents.  However, I quickly discovered that this was no solution at all.  The Kindle interface is so awful, that I managed to do none of these things. The only thing one can realistically do with a Kindle is read novels, preloaded in Amazon's format.  Actually the Kindle did improve my conference travel experience in that I no longer have to lug around books to read for fun.  Unfortunately, my original problem remains.

Now, I do realize that printing costs money, uses paper, and produces big proceedings that are no fun to lug around.  For big conferences like NIPS, I agree that printing all the papers would probably be too much.  And I'm confident that technology will be good enough in a couple years that this will no longer be a problem, making this inconvenience only temporary.  Perhaps going paperless is even the right solution at this time, though I tend to think we abandoned paper too early.

But I don't understand what everyone does in the meantime.  I've never seen anyone follow a paper on a laptop screen, nor have I ever witnessed anyone actually printing papers in advance of talks.  So what should I do?  Should I get an iPad -- will it solve any of my problems?  Is there something everyone is doing that I'm missing?  I don't want to have to bring up the old paper/electronic proceedings debate at the next business meeting!**

* My new laptop's battery might just last long enough, but the other issues remain with using a laptop.
** Okay, I'm bluffing.

Tuesday, February 15, 2011

Elementary, My Dear Watson?

As many of you know, this week IBM's computer system, Watson, is competing on Jeopardy against its two strongest performers, Ken Jennings and Brad Rutter. At the time of this post, their first match has finished, and Watson is ahead by a large margin.  Watson's lead is so great that it would be quite surprising if Ken or Brad were to catch up.  Given I do machine learning research (though I'd more call the Jeopardy task AI than ML), I couldn't resist posting about this match.

Ever since witnessing humanity's line fall when Deep Blue defeated Kasparov over a decade ago, we humans have become accustomed to computers outperforming us at various tasks.  Computers were first built precisely to do computations quicker and more accurately than we could hope to.  And even though winning at chess takes a lot more than brute-force computing power (if a computer really tried to calculate all possible chess move sequences, it would take more than the current age of the universe for it to finish), chess seems like one of those activities that computers should be good at.  Actually, the best humans can still easily defeat the best computer systems at Go, but few of us will be surprised when, in a couple years, this ceases to be the case.

Jeopardy is another thing all-together.  One might at first ask what possible hope is there of beating a computer at trivia.  Watson can download pretty much the entire internet into its memory (and has), but the problem is what to do with all this information.  And equally difficult is understanding the Jeopardy questions (or "answers" as they call them) in the first place.  Yesterday, Watson had a lot of trouble with the "name the decade" category precisely because it didn't know the answer had to be a decade.  Or in naming the murderer of Snape and others, Watson couldn't properly rank Voldemort over Harry Potter because, even though it had the entire book in its memory, it had no idea of the concept of murder -- only of word associations, and Harry and murder appear rather frequently together (you-know-who's fault).**

So the difficult part for Watson is exactly the easy part for humans (and vice versa).  Watson can easily store the name of every general, country, and battle, but has a lot more difficulty trying to figure out what is being asked.  Sure there are keywords like "he" or "this date," but when there's any ambiguity, it's quite a challenge.  And even if Watson figures out the answer is a date, it still does massive lookups, searches for correlations, runs machine learning algorithms, etc.  At any point, something can go wrong because Watson doesn't "understand" the way we do.  (Interestingly enough, as Louis von Ahn observes on twitter, the final Jeopardy question about city airport names was pretty hard to answer even for us humans using a search engine, though it wasn't so hard for Ken or Brad, but no human would answer Toronto for "U.S. cities").  Ken and Brad need no help interpreting the questions, but to them, remembering the ridiculous amount of information is the hard part.

That being said, the IBM team has done a great job with Watson.  Watson's performance this far already heralds other incredibly useful applications, which are not so unlike the Star Trek TNG computer systems -- only they'll come much before the 24th century.  Don't get me wrong; there's still quite a bit to go.  Real-life speech doesn't come in the form of nicely-formatted Jeopardy riddles (typed to Watson, who still doesn't have speech recognition), but this demonstration shows we've passed a major hurdle.

Finally, I should say that even though I believe that these technological changes are ultimately for the better, I am rooting for Ken and Brad*.  Once Watson can beat us, there's no going back.  In 10 more years, your typical personal computer, in whatever phone/laptop/terminal/watch shape it is, will run circles around Watson.

Yet, whatever discomfort I have will probably soon go away, and I'll become accustomed to computers being better than we are at yet another thing.  I'll take comfort in my personal better-than-Watson computer.  I'll enjoy the efficiency this brings.  I'll celebrate of the new scientific and medical breakthroughs we'll make using the help of (and closer interaction with) ever more powerful computers.  I'll, for one, eventually welcome our trivia overlords.

Update (2/16/11): Watson won today, as expected. And Daniel Reeves has some interesting ideas on how to change the rules of the game.

*I also felt that the game was a bit rigged against Ken and Brad -- the computer can always buzz in faster, so Ken and Brad have to compete for who wins the more human questions.  Were the game 1 on 1, it would have been a lot closer.  And if it were two Watsons vs just Ken or Brad, I'm sure the humans would win.

**Showing, once more that humans are decidedly more human, as we can easily tell the difference between Harry and Voldemort.

The image is of IBM's Watson Avatar and taken from Wikipedia.  It is posted for commentary under "fair use."

Thursday, December 30, 2010

Academic Resolutions

I don't normally make New Year's resolutions, and I don't plan on making any explicitly this year, but the coming of a new year feels like a good time to reflect on what we can do better.

So in the spirit of the tradition, I'll list some things many of us researchers (especially computer scientists) could work on in the coming year.
  • Finish writing papers at least a week before the (conference) deadline. That way, you can spend the last week improving the writing and rechecking the results.
  • Write up results immediately.  This will make writing papers much easier, and you can make sure your work is correct before moving on to the next thing.
  • Do reviewing in advance. We often have months to do our reviews, but often end up leaving them for the last week.  Even just reading a paper long before a review is due lets the ideas sink in a lot better.
  • Make journal versions (especially of conference papers that don't include full proofs). Until you've given a complete proof, people are justified in questioning your theorems.
  • Read more. It never hurts.
  • Enjoy work but also leave time to enjoy life.
Feel free, of course, to add to the list in the comments.

And have a happy New Year!

If you are serious about making resolutions and following through on them, check out beeminder. They seem to have good ideas on how to make your future self behave.

Monday, December 13, 2010

My Thoughts on NIPS 2010

This year, I attended my first NIPS conference -- NIPS stands for "Neural Information Processing Systems," but it has become one of the main venues for machine learning research in general. I've attended lots of machine learning conferences before, but somehow never managed to go to NIPS. This year, I had run out of excuses.

NIPS is broader than ICML, the other general machine learning conference, and it is certainly much broader than COLT and ALT, the learning theory conferences. I thought this would be a bad thing, as I expected that I wouldn't understand many of the talks and posters, but it turned out to have been one of my favorite (if not my favorite) conference experiences.

First, because most people at NIPS are in the position of not being familiar with everything, there wasn't the expectation that I would immediately know the model or the problem being tackled when I talked to someone. This strangely made things more, not less, accessible. Also, because lots of topics were outside my areas of research interests, I didn't feel internal pressure to attend many of the talks, and it allowed me to relax and have more time to talk to other researchers. And because NIPS draws a huge attendance, I got to reconnect with many people I hadn't seen for a while, and I had a chance to meet people whom I know only through their research or through correspondence.

On top of all that, within my areas of research, there were quite a few really good contributions, and I wanted to point out some of the papers that I found especially interesting. This is, of course, a list biased not only toward my interests, but also toward the papers whose presentations I happened to catch.
  • A Theory of Multiclass Boosting by I. Mukherjee, R.E. Schapire. This paper characterizes necessary and sufficient conditions for multiclass boosting and gives a new multiclass algorithm not based on reductions to the binary case. A nice an elegant paper, it won one of the best student paper awards.
  • Online Learning: Random Averages, Combinatorial Parameters, and Learnability by A. Rakhlin, K. Sridharan, A. Tewari. Building on the work of [BPS '09], this paper extends various notions of complexity to the online setting (Radamacher complexity, covering numbers, fat shattering dimension) and proves general online learnability guarantees based on these notions. This is similar in flavor to already known general offline guarantees.
  • Trading off Mistakes and Don’t-Know Predictions by A. Sayedi, M. Zadimoghaddam, A. Blum. This paper analyzes a model where the learner is allowed a limited number of prediction mistakes, but is also allowed to answer "I don't know," generalizing the KWIK model [LLW '08]. They prove some nice trade-offs, and there seem to be many potential interesting extensions.
  • Learning from Logged Implicit Exploration Data by A. Strehl, J. Langford, L. Li, S. Kakade. This paper discusses how to analyze the performance of a new contextual bandit algorithm by running it on previously recorded data. It's pretty intuitive that this should be possible, and this paper goes through the math carefully. This is a problem confronted by many content-serving companies, and I imagine this analysis will be quite useful.
Additionally, I also enjoyed David Parkes's talk on the interactions between machine learning and mechanism design -- this seems to be a very interesting area of research.

Finally, I also attended the NIPS workshops, which were rumored to be quite good, and they certainly lived up to their expectations. I especially enjoyed the workshop on Computational Social Science and Wisdom of the Crowds -- I decided to attend a workshop in an area about which I know very little, and all the talks were really good. In fact, from Yiling Chen's and Jake Abernathy's talks, I even learned about connections between prediction markets and online learning, so this workshop ended up being more closely related to my research than I expected.

Clearly, I've been missing out by not having attended the previous NIPS meetings. I'm planning to make up for it in future years.

Tuesday, November 30, 2010

To Boldly Research

In science, at least in computer science, in order for your research to have impact, you roughly have to do one of two things.
  1. Solve a known open problem that people care about.
  2. Solve a problem you made up and convince others that it’s interesting / important / impactful.
Often, a paper will do a combination of the two. Of course these aren’t the only ways to have impact – you might find a shorter proof of a known theorem, find a connection between fields others haven’t, etc. But discovering something interesting that nobody else has is a common theme.

So research requires some confidence, if not arrogance, to attempt. Successful researchers clearly need to find this confidence, and it’s not something that comes immediately. So I thought it might be helpful, especially to beginning graduate students (assuming any are reading), for me to spell out why I think succeeding in research is not as hopeless as it might first seem.

Lots of people think in the following chain, especially regarding solving known open problems: 1) Famous researcher X attempted this problem and didn’t get anywhere. 2) They’re clearly better / more experienced than I am. 3) Hence, I won’t be able to solve it.

The worst part of thinking this way is that it’s self-fulfilling – if you never make a serious attempt at a problem believing you can solve it, you probably never will. The second worst part is that this reasoning is flawed.

Lets assume that experienced researcher X attempted (but failed to solve) the problem you’re working on; it doesn’t mean that you can’t. There are a couple reasons for this.
  1. Famous researcher X is probably working on lots of problems and doesn’t have nearly as much time to devote to your problem as you do. By seriously applying yourself, you might succeed where X hasn’t.
  2. The process of doing research is partly random. You might just have an idea that X hadn’t.
  3. Some advances may have come out since X seriously attempted your problem. The average graduate student can solve lots of problems Gauss couldn’t solve, and it’s because science has made progress since then. This type of reasoning holds true on much smaller time scales.
There are also things you can do to increase the chances of being successful, or at least I’ve found that these things have helped me.
  1. Work on a problem that you’re actually excited about. This will make it much easier to put in the work needed to be successful. If research feels more like work than fun, then you might be doing it wrong.
  2. Find a problem that you think you have a chance in tackling. You should be able to feel in your gut if the problem “feels right.” Many problems are indeed too hard to attempt for beginning researchers.
  3. Don’t forget you have access to experienced person Y (your advisor) who might give you some insight that experienced person X didn’t have.
Of course I am also still a relatively young researcher, and getting research confidence is an ongoing process -- I still find working on hard open problems (rightfully) daunting. But if you can fool yourself into thinking you can succeed, even where others have failed, you might just turn out to be right.

Thursday, October 21, 2010

A Study in Purple

Yahoo! Research has a lab in New York city, right near Times Square. It is where I spent last year as as a "computing innovation fellow," and as promised, I want to blog about my experience.

Yahoo!'s New York lab isn't large. It has about 15 scientists, whose research spans several areas of computer science — machine learning, algorithms, economics and computation, storage systems, and computational social science. Most of the scientists are permanent researchers, but there also are a few postdocs. The lab often had talks, visitors, and interns to keep the atmosphere lively and interesting.

I mainly worked with John Langford on machine learning problems in computational advertising. A particular problem most search engines need to solve is what advertisements to show to their users, given some contextual information like a user's IP address, shared browser settings, and search query. The goal is to show users ads that they are likely to click, in order for the user to have a good experience and for the search engine to make a profit. However, how to do that is not clear because whenever an ad is shown, no explicit information is received about how well a different ad would have performed in the same situation. Effective algorithms need to balance exploiting strategies they have already learned to be good and exploring new strategies, possibly at some cost. In my time at Yahoo! we made good progress in understanding what sort of guarantees are possible for this problem, which is called the "contextual bandit problem" in the machine learning literature.

It was quite exciting to get to work on real-world internet scale machine learning problems, and have the results of my work have practical consequences. I was also very lucky to have a chance to work with John, who, in addition to being one of the foremost experts on this problem (and in machine learning in general), was a kind and patient host, from whom I learned a lot. I also got to work with Rob Schapire (my undergraduate mentor and collaborator thereafter), who was visiting the lab on his sabbatical from Princeton.

Overall, I was impressed by the flexibility scientists at Yahoo! have in choosing their problems, in having significant time to do research, and in having access to large amounts of real-world data at scales available at only a few other places. We computer scientists are quite lucky to have industrial research jobs as a serious option for research careers.

Now, I have started on a new adventure as a postdoc at Georgia Tech's Algorithms and Randomness Center. I have never been at such a big and "happening" department, and I am excited to get to work with many great researchers on interesting and important problems.