Thursday, December 31, 2015

An Eventful 2015

This post continues my tradition of reviewing the year.  This time my commentary is a bit longer than in previous years, so without further ado, here is my take on 2015:
  • This year, we again have two important computing anniversaries: the bicentenaries of Ada Lovelace and of George Boole.  Ada Lovelace, daughter of Lord Byron, was the world's first programmer -- her algorithm computed the Bernoulli numbers on the Babbage engine, a machine that had only existed on paper at the time.  Incidentally, my Ph.D. advisor, Dana Angluin, wrote a nice piece on Lovelace.  George Boole, of course, established the foundations of 0-1 valued Boolean algebra, which is essential for computer science.
    left: George Boole (photo from Wikipedia), right: Ada Lovelace (photo from Eventbrite)
  • This year also marks the 25th anniversary of the World Wide Web.  Congress decided to celebrate by ... passing CISPA.  Sigh.
  • The results the latest neural nets are producing are moving quickly from the realm of impressive to the realm of scary.  On that note, I think it's time for us to start seriously thinking about the potential dangers of AI.  If you believe there is nothing special about our wetware, and that we'll keep making scientific progress, it's pretty straightforward to reach the conclusion that we will likely end up building computers that are in every way smarter and more creative than humans.  Once this happens, the cat will be out of the bag, so to speak, and the only question, really, is when will this happen?  This isn't necessarily a bad thing, but it is something we'll probably have only one chance to get right.
A "reimagined" version of "The Scream" by Google Brain scares me more than the original.
  • On a related note, a few high-tech luminaries founded OpenAI, a new AI research center with a billion dollar (yes, you read that right!) endowment.  Just think what impact the 60 million dollar Simons centers have had, and you'll quickly see how big a deal this could be if the money is spent well.  While I think the people involved are fantastic (congrats to Ilya!), I'm surprised by the center's initial narrow focus on deep learning.  As Seb pointed out, we will need entirely new mathematical frameworks to come close to "solving AI," so such a narrow focus seems shortsighted.  Luckily, the folks at OpenAI have plenty of resources for any future course-corrections.
    Sam Altman and Elok Musk, two of the founders of OpenAI, photo from medium/backchannel
  • Assuming the result checks out, the field of theoretical computer science has had a rare breakthrough.  Namely, Laci Babai gave a quasipolynomial-time algorithm for the graph isomorphism (GI) problem.  I was lucky enough to attend the lecture where the result was announced,  My grad student, Jeremy Kun, wrote up what is probably the best account of the lecture that still serves as a nice introduction to the paper, which is now up on arXiv.  One interesting thing about this result is that it neither makes real practical progress on GI (we already have fast heuristics), nor does it change our view of complexity (since the GI result is "in the direction" we expected).  It's just that GI is a such long-standing hard problem that progress on it is a very big deal.    
Babai, presenting his proof.  Photo by Jeremy Kun.
  • In combinatorics, Terry Tao, building on a polymath online collaboration, solved the Erdos Discrepancy problem, by proving the conjecture true.  I like that these large online collaborations have now led to solutions, or helped lead to solutions, to multiple important open problems.
  • Paul Erdos and Terry Tao, photo from Wikipedia
  • This isn't primarily about computer science or math, but if you haven't heard of CRISPR, go read about it.  I don't think it's an exaggeration to say that CRISPR, or its offshoots, are probably going to change our world rather fast.  While this technique was discovered a couple years ago, it was named "breakthrough of the year" for 2015 by Science magazine.

Monday, December 21, 2015

Why Does the World Care about Our Math?

One day while working on bandit problems at Yahoo!, I had this strange realization that its search engine, nay the entire world, seems have a particular remarkable property: we can sit around doing math, and as a result, better advertisements will get served to users.  Of course, this applies not just to computational advertising, but to pretty much anything -- we can know where the planets will be, when certain epidemics will spread, how fast planes need to fly to stay airborne, and a plethora of other things just by thinking abstractly and solving some equations.

I immediately and eagerly shared my newfound realization with others, and it impressed absolutely nobody.  I was told "How else would the world work?" and "There is lots of math that's not useful, but we choose to work on and formalize the things are are relevant to the real world."  These are, of course, perfectly good objections, and I couldn't explain why I found my realization at all remarkable, but I'd had a nagging feeling that I was onto something.

Forward 6 years, and I'm at Market Fresh Books, a bookstore near UIC.  As an aside, this bookstore is really interesting -- it sells used books by the pound or for small flat fees. I even once picked up a copy of baby Rudin for just 99¢ (plus tax) to add to my library.  Anyhow, I stumbled upon a copy of "Disturbing the Universe," Freeman Dyson's autobiography from 1979, and it looked interesting enough to buy.  That evening, while reading it, I came upon the following passage by Dyson:
"Here was I ... doing the most elaborate and sophisticated calculations to figure out how an electron should behave.  And here was the electron ... knowing quite well how to behave without waiting for the result of my calculation.  How could one seriously believe that the electron really cared about my calculation one way or the other?  And yet the experiments ... showed it did care.  Somehow or other, all this complicated mathematics that I was scribbling established rules that the electron ... was bound to follow.  We know that this is so.  Why it is so, why the electron pays attention to our mathematics, is a mystery that even Einstein could not fathom."
I still don't know the answer, and I can't even state the question without it seeming silly, but at least I now know I'm in good company.

Freeman Dyson
image credit,

Wednesday, December 31, 2014

2014 in Review

As 2014 comes to an end, I decided I want to continue my newfound tradition of summarizing my thoughts in a "year in review" post.  So here are some thoughts on academia, machine learning, and theory, again in no particular order.
  • Every company seems to have its own superstar leading a neural nets effort.  And deep learning keeps making impressive advances.  My hope that a nicer learning theory gets developed around this topic.
  • Computer science enrollments continue to soar, and the "sea change" may be here to stay. It's becoming a better and better time to study computer science.
  • On the other hand, research labs have continued to be vulnerable.  Perhaps we'll see a reverse-trend, with academic jobs temporarily making up for losses in the research job market.
  • It's an interesting time for online education, which has had some setbacks recently.  Yet, it seems even Yale would rather stream Harvard CS50 than hire enough faculty to teach its introductory computer science course.
  • With the release of The Imitation Game, more people than ever will know about Alan Turing.  But will their impressions be accurate?
  • My favorite "popular" AI article this year was on computer Go.  My guess is that by 2020, computers will be able to beat the best humans.
  • After teaching a learning theory course last semester, next semester I'll be teaching a graduate-level "Foundations of Data Science" course, loosely following Hopcroft and Kannan's new book.  I'll have to make some tough choices about what to material include and what to skip.  Any thoughts?

Thursday, July 31, 2014

On AAAI 2014

I've just returned from AAAI 2014, which was held in Quebec City, Canada. It was my first time submitting to, attending, or presenting at a AAAI, and I was very pleasantly surprised on many levels -- the strength of the program, the breadth of research topics covered, and the conference organization.  I also really enjoyed spending some time in Quebec City, which was a wonderful location.

a photo of Quebec City that I took from the conference hotel


AAAI is quite a large conference with around a thousand attendees and a couple hundred presenters, so it involves multiple parallel tracks, posters, etc.  Carla Brodley and Peter Stone, the co-chairs, did a wonderful job organizing everything to work smoothly.  Moreover, they introduced two innovations that I particularly liked: 
  1. Authors whose papers were selected for talks were able to choose to give a 2-minute plenary talk instead of a regular session talk.  This worked really well for papers that carried a strong conceptual message, which might appeal to a wider audience. (I chose this option: you can see my slides here.)  I was never bored at the plenary sessions -- if I found one of these talks or topics uninteresting, it was over in two minutes.  On the other hand, if I liked a talk, I could find the presenter later at the poster session (see next innovation).  The slides for these talks were all pre-loaded on one machine, and Carla and Peter somehow got everyone to stick to just two minutes, so, much to my surprise, these sessions went very smoothly.
  2. Authors who didn't give talks and authors who chose the 2-minute plenary talk option had scheduled 45 minute slots to "present" their posters during one of the multiple poster sessions.  At any given time during a poster session only about 30 posters were officially being presented (you could also show people your poster and work at any time). This was great because I didn't feel like I had to stand by my poster for hours.  Moreover, during my 45 minute presentation, I had a much larger audience.  I really liked both innovations and hope they are incorporated into future AAAIs (and ICMLs, NIPSs, etc.). 
I also want to discuss some of the content that I enjoyed.  I should note that I left the conference day early, attended a somewhat arbitrary selection of sessions, and pseudorandomly meandered through the poster sessions, so what follows is not nearly complete.  I know for sure that I am leaving out many good talks and papers.

Plenary Talks

I only saw two of the invited talks, but both were really great.

image from mashable

In the first talk, Adam Cheyer, one of the main inventors of the iPhone "digital assistant" Siri, discussed the history of the project, as well as some of the technology behind it.  One thing I learned from the talk was how much Siri is able to do that I didn't know about.  For instance, if you're reading an email and want to call the email's sender, all you need to do is say something like "Siri, call him" if the contact info is in your phone.  If you think about it, this is incredibly easier than having to leave email, go to phone contacts, search, etc.  In fact, Adam pointed out that while your hands are the best interface for getting at items on a screen, your voice is better for accessing content you can't see.  If one uses Siri with this in mind, it can be much more effective.  Adam also pointed out how immensely challenging building the system was.  For instance, a user who wants two tickets for the movie "Nine" at "8pm" can say: "two for nine at eight" and Siri has to figure out what to do after hearing what seems like 4 different numbers.  Just thinking about the difficulty of this problem should impress us how for AI has come!

As for Siri's history, the entire recounting was interesting, but one facet stuck out the most:  Immediately after the iPhone was announced in 2007, Adam and two of his colleagues met to discuss how to take advantage of the new technology.  Just three months later, they were making a pitch to VCs to fund their new idea.  By 2010 Siri was an app, and by 2011 it was bought out and integrated into iOS, making billions of dollars for Apple and impacting millions of users.  Adam clearly had a lot of experience working on these types of systems and in launching new and exciting products (CALO, IRIS, etc.), but it amazed me how much one can accomplish in a short time if one is talented, experienced, and motivated enough.

an example UI from the coloring game described below, taken from this paper

The second invited talk was by Michael Kearns, who is known for a lot of things including fundamental contributions to learning theory.  But in this talk, he discussed some of the user studies in network science research he and his group at UPenn have been doing.  In these experiments, subjects come to a room and sit at computers and perform various tasks like distributedly coloring a graph, coming to consensus, etc.  In their typical setup, players play on an underlying graph (whose topology they don't know), with players controlling one vertex each and only seeing what their neighbors are doing.  

In a coloring game, the players have to choose one of a few colors and will only get paid if their final color is different from the colors chosen by all their neighbors.  They can asynchronously switch their own color as much as they want, and the goal is to see how long it takes various graphs to get colored.  In a consensus game, the goal is for the entire graph to come to consensus, so the players need to choose colors that are the same as their neighbors' choices.  From the computational complexity standpoint, coloring is NP-hard (in the worst case) and consensus is trivial, but Kearns's group observed that the hardness of performing these tasks in the real world depends on the interplay of the graph structure and the task.  For instance, on many graphs, consensus appears much harder than coloring.

The result I found the most interesting pertained to an experiment they ran on coming to consensus in a game where different nodes have different preferences.  The way this setting works is that the players only get paid if at the end of a set amount of time, all have agreed on a single color.  However, different players will be paid different amounts depending on which color they agree on.  (This was made to be a simplified model the Obama vs. Hillary primary, where democrats had strong reasons to come to agreement, but had different preferences about whom to agree on.)  Kearns's group showed that in such a game, a small minority of well connected players with one set of preferences in all their trials override a much larger set of worse connected players with different preferences, even though the minority in question had no information about the structure of the graph (or even any knowledge that they were more connected than others).

This seems an interesting line of research and could hope to explain various real-world phenomena.


Finally, I also enjoyed quite a few of the papers at the conference.  One thing that surprised me was the large number of machine learning and also game theory papers.  The learning / game theory papers were on average more application driven than what one usually finds at ICML/NIPS or EC, but it seems that these fields are becoming increasingly incorporated into AAAI.

So, here's a very incomplete list of some papers I found interesting:

  • Active Learning with Model Selection by Alnur Ali, Rich Caruana, and Ashish Kapoor -- this paper considers active learning for  simultaneously choosing the best model and its parameters.  This is done by training the model on a training set and evaluating its performance on a validation set.  Both sets require labeled data.  So the active learning algorithm, in sampling a label for a new point, must also consider whether to add the point into the training set or into the unbiased validation set.  The resulting performance of their algorithm shows that it roughly first "explores" by placing into valuation and then "explores" by placing into training.  I found this idea and interpretation pretty neat.
  • Fixing a Balanced Knockout Tournament by Haris Aziz, Serge Gaspers, Simon Mackenzie, Nicholas Mattei, Paul Stursberg, and Toby Walsh -- this paper considers a natural, previously open, question.  Given you know the probabilities with which each team will beat each other team, can you design a knockout tournament (balanced binary tree) that maximizes the probability of a given winning?  This turns out to be NP-hard, but they also give some positive results in certain cases.
  • Preference Elicitation and Interview Minimization in Stable Matchings by Joanna Drummond and Craig Boutilier -- consider medical residents being matched to schools.  For residents and schools to choose their rankings, they need to interview with all the schools.  This is really costly.  This paper assumes that when one of two options is much better than the other, an interview probably doesn't need to be scheduled just to tell them apart.  Then, given their modeling assumptions, they try to find stable matchings without having to do too many expensive "interviews."  What I like about this paper is that it considers the sort of practical concerns that give a new twist to what seems like an already solved problem.
  • Simultaneous Cake Cutting by Eric Balkanski, Simina Brânzei, David Kurokawa, and Ariel D. Procaccia -- this paper looks at "cake cutting" in the simultaneous model, where user preferences are elicited simultaneously.  Cake cutting problems involve dividing a resource, say a yummy continuous cake, such that everyone is happy with the division (or at least thinks the division fair under various notions).  They show that in a simultaneous model, one can satisfy "proportionality" but not "envy-freeness", and also give some approximation results.  I don't know exactly why I like this model, but I have a soft spot for cake cutting.
  • Bagging by Design (on the Suboptimality of Bagging) by Periklis Papakonstantinou, Jia Xu, and Zhu Cao -- this paper gives a subsampling method that outperforms bagging, in the case, roughly, when bagging works, e.g. by being more stable to noise than a single predictor.  Their method comes from the observation that bagging is done over subsampling a fixed dataset, which makes bagging behave differently than if it had access to an unlimited supply of independent samples.
  • Recovering from Selection Bias in Causal and Statistical Inference by Elias Bareinboim, Jin Tian, and Judea Pearl -- this paper won the best paper award.  It gives complete conditions on when one can recover from selection biased data. They also consider settings where some unbiased data is available and other extensions.  I plan on reading it more carefully, as the results seem important.
In sum, I enjoyed AAAI and look forward to attending it again in future years.

Tuesday, July 01, 2014

Should You Get a Ph.D.?

I am occasionally asked by students whether they should go to grad school to get a Ph.D.  So much has already been written on this subject that I hadn't felt it necessary to put my opinion in writing, nor did I have any concrete advice.  But reading Duncan Watts's excellent recent article on the subject crystalized my thoughts.

So here, succinctly, is my advice:
Enroll in a Ph.D. program only if you think you'll enjoy graduate school.  Don't enroll just because you think a Ph.D. will help you in your future career.  In other words, don't go to graduate school as a means to an end.
Note: my advice is meant for US students seeking a Ph.D. in the sciences.  It probably applies more broadly.

Yale's computer science building, Arthur K. Watson Hall, where I spent a lot of time in grad school.  Image from here.

My advice may seem obvious, but it's not.  For instance, this would be bad advice for students considering an M.D. -- almost nobody enjoys medical school, but I think few graduates regret having gone.

But there are three main reasons why this is the right advice for grad school.
  1. In Ph.D. programs, unlike in many of the professional schools, you're supposed to be doing pretty much what your professors do -- research.  Sure you have to take some classes and get paid less, but chances are if you don't enjoy grad school, you won't enjoy a research career either.  (Going to grad school to pursue a finance career or something unrelated later is doing it the hard way.)
  2. Jobs in research, and especially in academia, are scarce, and going to grad school just because you might like a research job later is a very unsafe bet.  So, going to grad school only makes sense if you'd enjoy the research career, and also if you'd enjoy the experience anyway if a research career doesn't work out.  Four to six years of your life is a long time to be miserable.
  3. Grad school, unlike professional school, has no end-date.  You finish when you've done enough.  And it's hard to do enough if you're not enjoying the work.  Forcing yourself to sit through anatomy is one thing; forcing yourself to be creative is quite another.
Also, notice that my advice is an "only if."  So even if you might enjoy grad school, you should explore other possibilities.  Research is incredibly rewarding, but also has downsides which need to be considered.

That being said, I don't mean to be negative.  I'm very happy I went to grad school -- I had a great time there and am lucky to keep on getting to do research.  So, if you want to get a Ph.D., go for it.  Just remember to enjoy the journey.

Tuesday, December 24, 2013

My 2013 Post

The end of 2013 is approaching, and I realized I hadn't posted a single time this year. Starting a tenure-track job and having a baby apparently leave little time for blogging!  However, I do want to keep my blog somewhat active, so I figured I better make at least one post for this year.

But what to write about? It's hard to choose just one topic to make a "post of the year".  So, I'll just write about some work-related highlights of 2013, big and small, in no particular order:
  • The newly founded Simons Institute for the Theory of Computing seems to have had a great year. I haven't had a chance to visit, but I have watched some videos of their excellent talks online! And I've been keeping up with Moritz Hardt's interesting new blog.
  • I've for the first time gotten involved in running a conference, or rather part of it: ISAIM 2014 will feature a session on the "Theory of Machine Learning," and I'm quite happy with the program. (It's not too late to attend!)
  • My friend Nikhil Srivastava (who appeared on this blog before), together with Adam Marcus and Dan Spielman, proved the Kadison-Singer conjecture.  It's a result that touches on many areas of mathematics -- its resolution a big deal, and especially exciting it came from the computer science community.
  • With the opening of SCDA, Facebook Research Labs, MSR-NYC, and others, machine learning seems to be quite in demand, both in academia and industry. (And, not quite unrelatedly, deep learning is continuing to make a comeback.) It's a great time to be a machine learning researcher or practitioner.
  • Near the beginning of the year, my co-authors and I finished and published a paper extending statistical queries to optimization algorithms and proved a lower bound for planted clique. I'm particularly happy about this result, which had been in the works for a while. 
  • I published my first paper with students at UIC. One of them, Jeremy Kun, who is working with me, has a great blog, which you should all read!  This also happens to be my first work primarily in game theory, which seems to be an area many learning researchers delve into. 
  • While last year was 100 years since Alan Turing's birth, this year was Erdős's centenary.  One could say that Erdős was the Turing of combinatorics, and he had quite a bit of impact in computer science as well.
  • It looks like the NSA hasn't made any huge mathematical breakthroughs, but that hasn't kept them from breaking crypto-systems in the real world, both in ways we'd expect and in ways we wouldn't.
  • Leslie Valiant published Probably Approximately Correct. I'm sold, but can it bring learning theory to the masses?
I'm sure I missed some important things, but that's all I have for now.  I resolve to post more next year. It's a modest, but achievable, goal.  To a happy and productive 2014!

Friday, December 28, 2012

Turing's Revolution

In my post Alan Turing Year, I discussed how it's important for people to be aware of Turing's intellectual contributions to math and computer science, as well as his war efforts and tragic death.  It's time to live up to my own call and do my part.  In this post I'll talk about Turing's impact, from his best known contributions to his lesser known, but important ones.

I should note that the content of this post is largely stolen from answers to a question I asked on the StackExchange (SE) cstheory site, where I sought to gather a compendium of Turing's less known results.  The answers made a collection of impressive contributions, some of which I wasn't familiar with.  Some of his contributions turned out to be surprisingly modern.  They also introduced us to viewing the world through an algorithmic lens.

Without further ado, here they are:
1. Defining Turing Machines and Showing the Existence of UTMs (1936): A Turing machine is a universal computing device that forms the basis of how we think about computing.  A Universal Turing machine is a fixed machine that can simulate any other Turing machine -- most relevant to our current computers. (SE link)
a Lego Turing Machine, photo from wikipedia 
2. The Church-Turing Thesis (1939): The idea that the Turing Machine (and Lambda calculus) captures all of computation, and is therefore worth studying.  This hypothesis cannot be proven, but everyone believes it nonetheless. (wiki link
3. Solving the Halting problem (1936): A version of this was posed by Hilbert.  Turing showed that telling whether a program will Halt is undecidable.  (wiki link
4. The Turing Test and Introducing AI (1950): Turing gave an objective test that decides whether a computer exhibits intelligence.  The idea is to say that a computer is intelligent if it can fool humans into thinking it is also human.  Accepting this idea is a way to avoid many philosophical arguments about the nature of intelligence. (wiki link
5. Introducing Oracles and Relativization into Computability Theory (1939): Oracles are incredibly useful for all sorts of analysis -- for example the famous Baker-Gill-Solovay result that says P vs NP cannot be separated any technique that relativizes (holds for relative to any oracle), eg diagnolization.  It was part of the paper "Computing machinery and intelligence" that launched AI. (SE link
6. The Bombe (1950): Also called the Turing-Welchman Bombe, for breaking the German enigma machine in WW2 and introducing tools to cryptanalysis. (SE link)  
 the Bombe, from
7. Randomized Algorithms (1950): Turing observed the potential advantage an algorithm can gain from having access to randomness. (SE link)
8. The First Artificial Neural Network (1948): This idea is an important one in AI and machine learning.  Turing's design predated Rosenblatt's preceptron. (SE link
9. The Chemical Basis of Morphogenisis (1952): In his most cited paper, he began to tackle the question of how a symmetric embryo develops into an assymetric organism using symmetry perserving chemical reactions.  This started to introduce algorithmic thinking into biology. (SE link
10. Good-Turing Estimator (1953): For estimating an unseen fraction on a population when given past data. (SE link
11. One of the First Chess Algorithms (1952): Before we had computers, we had paper chess algorithms. (SE link
  Deep Blue, photo form wikipedia
12. Checking a Large Routine (1949): The idea of using invariants to prove properties of programs and "well-foundedness" to prove termination. (SE link
13. LU-Decomposition (1947): A method for factorizing a matrix as the product of a lower triangular matrix and an upper triangular matrix.  Now commonly studied in Linear Algebra. (SE link)
14. Turing's Method for Computational Analysis of a Function (1943): Invented for studying the of the Riemann zeta-function but is more widely applied. (SE link)

I'm sure there are more.  Any one of these is an impressive achievement for a scientist -- the first couple are revolutionary.  That Turing did all of these and more shows his remarkable contributions to our field.

Update (12/31/12): Alan Richmond points out another important work of Turing's, which I added as # 14.  Also, my wording for the Turing test proved a bit controversial, so I've changed it to better reflect reality.

Friday, December 07, 2012

UIC's MCS Program Seeking Graduate Students

As I'd posted before, UIC has two departments interested in computer science.  We have a computer science department in the engineering school.  And the math department in the school of liberal arts and sciences, where I have an appointment, has a smaller program in "mathematical computer science" (MCS).

The MCS program is more focused on the mathematical aspects of computing and less on systems, databases, etc.  It is similar to the ACO programs at CMU and Georgia Tech.

We recently updated our website, where you can learn more about the program and the affiliated faculty.  More importantly, if you're a student looking to get a Ph.D. in a mathematical area of computing, consider applying to our program -- we're looking for strong applicants.

a screenshot of the new MCS website

Tuesday, August 07, 2012

Diamond Forever

Recently, the academic community started to fight back against traditional academic publishers, who earn large profits without any longer providing many clear services.  The internet has been a great tool for taking care of such inefficiencies, and it's quickly disrupting the publishing market as well.

Having thought a bit about these issues myself, I wanted to summarize my views of some of the various publishing models (as described by Tim Gowers) under consideration, as well as the pros and cons of each.  It's possible, even likely, that I am missing something, so feel free to point it out in the comments.

The Systems


screenshot, after I tried to access a paper from

system: Researchers send manuscripts to publishers. Publishers have volunteer editorial boards consisting of researchers who decide whether to accept/reject manuscripts. Accepted manuscripts are published but cannot be accessed without a paid subscription to the given journal. Authors lose most (or all) rights to their own work.

who wins: publishers, by charging for access

who loses: university libraries and therefore scientists who pay for it with overhead; the public, who doesn't get access to the research their taxes likely funded; authors, who cannot easily have their research read and used.

opinion: This system made some sense before the internet era, where there were few other good ways to disseminate research. The publishers used to have to work hard for their profits by doing serious formatting/printing/publishing work. With the internet, all of this is now done by the authors, and the closed-access system no longer makes sense.


system: Same as closed-access, except that authors are allowed to put their accepted manuscripts on their websites and onto online repositories such as the arXiv.

who wins: publishers, by charging for access; researchers who put their papers online

who loses: basically same as closed-access, except for the authors who put their papers online

opinion: This is the system most journals are now on, or at least they usually don't go after scientists who keep their papers online. This is strictly better than closed-access, but doesn't go far enough. Too many scientists do not put their papers on their websites or arXiv. There is still no guarantee that science will be public, as it should be.


a screenshot of the new gold-access math journal

system: Researchers send manuscripts to publishers. Publishers have volunteer editorial boards consisting of researchers who decide whether to accept/reject manuscripts. Authors of accepted manuscripts pay to have their work published but published manuscripts are freely accessible to everyone online.

who wins: the public, who can access research for free. the publishers, who still get paid (but not as much)

who loses: the authors, who have to pay (usually large sums) to have their work published. universities, which might need to subsidize the fees. quality (see below).

opinion: I think adopting this system would be a mistake. First, scientists are the ones doing the work, the editing, and the main contributions -- it seems unfair to make them pay as well. Researchers without grants or university support would have to use their personal funds. Moreover, I think this would seriously impact paper quality.  In this system, journals would make money by accepting more papers as opposed to getting more readers, and some authors may have incentives to pad their CVs. I cannot see how this wouldn't go wrong.


the journal of machine learning research, image from their website

system: Researchers send manuscripts to publishers. Publishers have volunteer editorial boards consisting of researchers who decide whether to accept/reject manuscripts. Authors of accepted manuscripts have their work published for free and published manuscripts are freely accessible to everyone online.

who wins: scientists, who can publish for free and have their papers read. the public, who can access research freely. universities, who do not have to pay for journal access. research quality.

who loses: publishers, who cannot any longer charge for anything

opinion: This is the best of all worlds. Everyone seems to win, and research quality especially wins because nobody makes a profit from publishing bad results.  The only problem is that the publishers lose, and why would anyone run a journal without making a profit?

A Solution

Diamond-access is the best system for everyone, except for the publishers.  However, publishers need incentives to run a journal, which they wouldn't have in a diamond access system.  Now, becoming a publisher would involve operating at a loss.  So who should take the hit?

One obvious solution is for universities to publish (or pay for publishing) journals.  They would make a sacrifice in maintaining the journals for free, but what would they win? Quite a lot actually.  Were we do switch to this system, university libraries would no longer have to pay subscription fees to journals, and the university professors wouldn't have to pay to have their work published.  And the quality of publications wouldn't suffer.   Some of this cost could even be handled by overhead.

Yale's library (from wiki), where I not once went to do research during my Ph.D.

But what is the cost of running a diamond access publication?  Well, it turns out to be very small.  With pdfs online, there's really no need to publish paper proceedings.  I don't think I've ever gone to a library to find an article on paper -- pretty much nobody does these days.  So, all you need to do is register a domain and pay for some maintenance and hosting.  The rest of the work is done by volunteers.  Looking at the financial statement (of which I learned from Google+) of the Journal of Computational Geometry, it could take as little as $10 per year.  Heck, we don't really need universities to pay for it.

Yes, it really costs very little to run a journal these days if you do it right.  There's no need for readers, or authors, to pay huge sums to middle-men.

Monday, May 07, 2012

Atlanta, Northward

When you accept a postdoctoral position, you know you probably can't stay indefinitely.  Psychologically, it means you have to prepare yourself for having to switch jobs, coworkers, offices, etc., and I, as a postdoc, have been ready for this foreseeable consequence with Georgia Tech, where I've enjoyed being part of a strong and lively department the last two years.

Unfortunately, when you take a postdoc in a city like Atlanta, where there isn't as high a concentration of universities and research labs as some other places, you not only have to leave your job, but likely also the city.  And even though it was in the back of my mind that I'd probably have to move, I didn't expect to love Atlanta as much as I have come to.
Atlanta's botanical garden

Atlanta has sunny, warm weather, really friendly people, lots of trees, amazing restaurants, nice parks, museums, a symphony, theaters, and everything you could want in a city (except, perhaps, a waterfront).  It's also very livable -- the population density is fairly low, so you're not elbow-to-elbow every time you leave your house, and you can live in really nice suburbs right in the middle of the city.  Because of the large influx of new residents, it's mostly new and clean, yet the cost of living is quite low.  And you can reach great hiking with a 20 minute drive and reach a beautiful national park in 3 hours.  It's certainly a place I wouldn't mind living in again at some point in my life.
Smoky Mountains National Park

But while I'm sad to have to leave Atlanta, but I am also excited to begin a new phase in my career.  In a couple months, I'll be joining the University of Illinois at Chicago (UIC) as an Assistant Professor in the Math department, actually called "Mathematics, Statistics, and Computer Science."  The department is quite strong, but unlike most math departments, UIC's is expressly concerned with Computer Science (so, yes, I'm remaining a computer scientist). Additionally, UIC has another entire standard Computer Science department, where I'll also have a courtesy appointment.  Between the two departments, there are quite a few great faculty members working on various aspects of machine learning -- I'm looking forward to joining them. And with Northwestern, UChicago, and TTI-C nearby, Chicago should be quite an exciting place to work.
Chicago, reflected in "the bean"

But because this post started with Atlanta, I should say something about the city of Chicago.  While I don't yet know Chicago well, it also looks like a beautiful, livable place, on the shores of a (literally) great lake.  The people seem friendly, and there is a lot to do.  I just hope that, after being spoiled by Atlanta, I can get used to Chicago's cold winters.

Friday, April 27, 2012

Computing Conceptions

From Georgia Tech, many of us have been closely and concernedly watching our southern neighbors at the University of Florida, where budget cuts and a seemingly hostile dean conspired in an attempt to decimate their well-respected computer science department.  Dismantling the department, at a time when computer science research is so critical, would be a laughably bad decision on the part of the university.

I don't claim to fully understand the rationale behind this plan.  However, I have a feeling that part of the reason such an idea was even being seriously considered has to do with a couple misconceptions of what computer science research is, and I fear that such misconceptions extend beyond the state of Florida.  And I don't just mean that the average person doesn't understand computer science; that's to be expected. I mean that many academics, even scientists, don't understand the basics of what computer science is about and therefore tend to devalue it, especially as an academic discipline.

First, many people seem to assume that computer science is just programming or fixing computers.  As a graduate student at Yale, where computer science is a small department, I was often asked by other Ph.D. students why computer science even has a Ph.D. program.  They didn't view it as an academic pursuit, but more as a trade skill.  I fear that many scientists view computer science as limited to programming or getting computers to work, probably because that's the way most, say, physicists use computers.  They have little understanding of the beautiful, deep results and insights that computer science has brought the world.  Viewing it as an instrumental non-academic field, people think it would be okay to kill-off computer science research and leave the professors to teach programming (which, admittedly, is an important part of a computer science department's job).

(clip art from here)
something computer scientists do not normally do

The other, very related, misconception, one that was clearly in play at the University of Florida, is that the computer science department was somewhat redundant because the electrical and computer engineering department already has the word "computer" in it.  Their reasoning sounded more sophisticated than that, but only superficially.  But computer science and electrical engineering are very far in their central concerns.  Computer science, for the most part, is divorced from concerns about electricity, physical media, or anything of that sort.  Whether you work on operating systems, machine learning, or the theory of computation, you mostly don't really care about the underlying hardware, whereas electrical engineers do.  Greg Kuperberg, writing on Scott Aaronson's great blog post on this issue, puts it better than I could:
"Apparently from (Florida engineering dean) Abernathy’s Stanford interview, and from her actions, she simply takes computer science to be a special case of electrical engineering. Ultimately, it’s a rejection of the fundamental concept of Turing universality. In this world view, there is no such thing as an abstract computer, or at best who really cares if there is one; all that really exists is electronic devices.
[...] Yes, in practice modern computers are electronic. However, if someone does research in compilers, much less CS theory, then really nothing at all is said about electricity. To most people in computer science, it’s completely peripheral that computers are electronic. Nor is this just a matter of theoretical vs applied computer science. CS theory may be theoretical, but compiler research isn’t, much less other topics such as user interfaces or digital libraries. Abernathy herself works in materials engineering and has a PhD from Stanford. I’m left wondering at what point she failed to understand, or began to misunderstand or dismiss, the abstract concept of a computer."
(image from here)
something usually not of research interest to computer scientists 

It looks like a disaster in Florida has so far been avoided. And with each passing year, more scientists will have, at the very least, taken some basic computer science in college -- it is part of our job to teach the important concepts in our introductory courses.  I'm hoping this will improve the perceptions of our field.  But in the meanwhile, it has become apparent that we have much more PR to do to.

(image from here)
now we're talking!

Sunday, January 22, 2012

Alan Turing Year

I am excited that 2012, the centenary of Alan Turing’s birth, will see many celebrations of his life and legacy.  It is hard to think of a scientist who has had more impact both on science and on our lives, but who still remains as unknown to the public, so I am glad Turing will be getting some much-deserved attention this year.

Alan Turing, photo copyright held by National Portrait Gallery in London

Alan Turing laid out the foundations of the study of computer science (in the process of solving Hilbert’s Entscheidungsproblem – no small feat on its own), contributed fundamentally to our understanding computation in nature, envisioned the future of intelligent machines, and saved millions of lives by helping shorten the length of World War 2.

Yet upon discovering Turing was gay (and convicting him for "indecency"), the British government took away his job, kept many of his contributions secret, and chemically castrated him, driving him to suicide.

After some long-overdue outrage, Gordon Brown issued an apology on Britain's behalf, albeit decades too late.  Now there are new petitions, asking for the British Government to grant Turing a pardon.  I have nothing against these efforts, but how Turing was treated is to our collective shame, and we should remember that these efforts are to make us, not him, feel better.  Turing knew he wasn't doing anything wrong and never needed condescending “pardoning” from politicians, nor does he need them any more after his death.  If the pardon is to send an apology to the homosexual community for their mistreatment, I’m sure the British government can think of something a little more direct.

I think a better way of honoring Turing is to make sure people know about his work – like we do of KeplerGalileoNewtonDarwinEinstein, and many of the other great scientists who revolutionized our world-views. Unfortunately there’s a lot of work to be done: even well-meaning articles trying to popularize Turing’s work mainly describe him as an accomplished code-breaker and World War 2 hero.  Turing did break codes, most notably the German Enigma machine, but calling Turing a code-breaker without mentioning his scientific impact is akin to calling Isaac Newton a treasurer, given his years as England's Master of the Mint.

So, this year, we academics will celebrate Turing’s great accomplishments by holding conferencesawards, and lectures in his honor.  There will be documentaries released about his life.  I’m sure an op-ed or two will be written in the newspapers.  But I also hope that reporters, lecturers, and especially teachers, will help the the public at large learn about this pioneer of the information age.

Thursday, October 27, 2011

ALT 2011

Earlier this month, I traveled to Finland, where I attended ALT 2011 and presented a paper on learning sparse parity functions in the presence of noise. 

a picture I took in Helsinki

COLT (Conference on Learning Theory) and ALT (Algorithmic Learning Theory) are the two main learning theory conferences.  Though COLT is both the larger and better known of the two, it seems that ALT is growing stronger with every passing year, and this year was no exception.  Not only was the program good, but Sébastien Bubeck gave a timely tutorial on bandits, which deservedly got their own session this year. (I, unfortunately, arrived too late to see the tutorial, but I read the nice slides afterwards.)

I wouldn't be the first to note that the other thing that differentiates ALT from its big brother is that ALT seems to have a broader definition of learning theory, and the conference is therefore more diverse in its topics than a typical COLT.  This has clear benefits, but it also means that there are entire sessions where I am pretty lost.  Of course, as I've previously noted, not feeling like you have to following everything could also be a benefit when you're attending a conference.

I want to mention one paper I particularly enjoyed, especially because of the model, which illustrates the importance of unlabeled data in the agnostic setting:
"Learning a Classifier when the Labeling is Known" by Shai Ben-David and Shalev Ben-David.  This paper works in an agnostic learning model where the learner already knows the labeling function, but wants to find a good succinct representation (i.e. proper learning).  How good a particular representation is depends on the distribution the data is coming from, and this is what the algorithm needs to learn by seeing examples.  It turns out that this, more generous, model is subject to essentially the same sample complexity lower bound as the common agnostic PAC model.

Monday, August 22, 2011

Happenings on CSTheory Q&A

In addition to machine learning, one of my main research interests is theoretical computer science.  Following the success of MathOverflow and other Q&A sites, ("cstheory" for short) was born about a year ago, and I've been an active participant since its inception.  In the last year, we've worked hard to make cstheory a valuable resource and to promote it.  I'd say the efforts paid off -- the site is doing quite well, with over 5000 members.

Moreover, cstheory recently added a blog (Aaron Sterling and Joe Fitzsimons are editors), whose posts are on topics related to theoretical computer science and to the site itself.  In one of the first posts, Suresh Venkatasubramanian, a site moderator and one of the people most responsible for its success, wrote a nice summary of cstheory's first year.

I've also volunteered to occasionally blog about the learning theory side of things.  My first entry, on learning regular languages, was just posted.  It touches on many interesting issues learning theorists regularly encounter. Enjoy!

Saturday, August 13, 2011

Reyzin-Srivastava Trees

When I was a grad student at Yale, Ravi Kannan taught a great sequence of graduate theory courses on approximation algorithms, randomized algorithms, and streaming algorithms, and I was lucky enough to get to take all three.  The first one, approximation algorithms, Ravi taught during my first semester in grad school.  We were allowed to collaborate on assignments, and I discussed many of them with my friend Nikhil Srivastava.

I remember a problem on one of the first assignments was particularly tricky.  The question was about s-t cuts in a graph.  The hard part about this problem was that to solve it, it seemed we needed to efficiently keep track of all pairs minimum s-t cuts.  We started discussing the problem in the afternoon, and though it took us a while, by about 5am we had designed a data structure to do just that.  We realized you can maintain all the cuts in just one tree.  It was perfect.  We jokingly called the data structure the Reyzin-Srivastava tree.

I spent the remainder of the night writing up the assignment, describing this thing in detail, proud of our cleverness.  About a week later, we got the graded assignments back.  Mine had one note on it, next to that particular question of course.  It said "Next time, just reference Gomory-Hu trees when you need to use them."

I learned a couple lessons from this.  The first is that I should remember to do the required reading before starting the assignment (we used Vijay Vazirani's approximation algorithms textbook, which really is worth reading).  Had I done the reading, I would have learned about Gomory-Hu trees and realized the question asked us to trivially apply them to some problem, instead of reinventing them.   The second, and perhaps, more important lesson was that I had a shot at doing research, at inventing something new.  Yes, these trees had been invented decades before, and yes, we'd been exposed to more data structures than Gomory and Hu had been in 1961.  But somehow knowing that we figured out something on our own that wasn't trivial really helped our confidence as first year grad students, and as I've written before, confidence is quite important for research.

Nikhil and I went on to do some real research together, producing two joint papers the following year.  Actually, both of those papers also began as classwork, for Dana Angluin's machine learning class.  But that's a story better left for another post.

Friday, July 15, 2011

On ICML 2011

I've heard from some people that they liked my summary of NIPS 2010, so I'm posting my thoughts on ICML 2011, where I went a couple weeks ago.  ICML is one of the two main yearly conferences in machine learning, so there are always interesting new results to see, and this year saw a record of over 700 attendees.  Unfortunately, I didn't get to attend the tutorials or the workshops, so my post is about the main conference.  The conference took place in Bellevue, a city near Seattle, Washington.
Bellevue downtown at night, from Wikipedia

Invited Talks

One of the main highlights of ICML 2011 were the invited talks.  Usually, I like one invited talk or so, but this time all of them were great.   I've heard a number of people remark that the organizers did a great job of choosing the speakers.  I should also note that the videos of all the talks, invited and not, will soon appear on TechTalks, and I recommend you watch the invited talks especially.

The first talk was by Chris Bishop, of the Microsoft Research (MSR) Cambridge, UK lab, who talked about applied machine learning coming of age.  He addressed a variety of topics, perhaps a bit loosely tied together, but the real highlight of his talk was the first half-hour, where he addressed the technology behind Kinect.  Apparently, much of the technology behind it was developed at MSR (this should be good for research funding at Microsoft, as Kinect was a huge commercial success), and he went into some depth about both the machine learning and hardware challenges in building Kinect. For example, machine learning was used for the body tracking system in Kinect, but with some interesting twists, e.g. 1) the body tracking problem isn't treated as a tracking problem at all: each individual frame is independently classified, 2) lots of the training data is synthetic.  On the hardware end, I learned that Kinect could track you just fine in the dark.  To find out more, you should watch the entire talk.
Kinect sensor, image from Wikipedia

The second invited talk was by a mathematical biologist, Martin Nowak, who talked about the evolutionary dynamics of competition and cooperation.  He talked about all sorts of interesting things.  What stuck with me the most were some results he presented on the repeated prisoner's dilemma.  In the famous classic prisoner's dilemma, each of two prisoners must independently decide whether to cooperate or defect (see figure below for a possible payoff matrix).  The dominant strategy for both prisoners is to defect, but if both prisoners cooperate they are better off than both defecting, hence the dilemma.  In the real world, we often play variants of this game, but over and over again.  The repeated prisoners dilemma setting captures this aspect of repeated play, and a while ago, Robert Axelrod held a famous repeated prisoners dilemma tournament where he showed tit-for-tat was a really good strategy: basically, to cooperate at first and then, on each future round, to do what your opponent did on the previous one.  I have a lot more to say on this topic, but I'll skip to Martin's talk.  Martin presented some interesting results where he showed us that if noise is introduced into this repeated game (like in real life), and one runs an evolutionary tournament that starts with random strategies, first tit-for-tat will evolve.  But then, tit-for-tat will be taken over by a more forgiving versions tit-for-tat, which will subsequently be dominated by completely cooperative strategies, which will be exploited by completely defecting strategies, which will then be taken over by tit-for-tats again.  And this cycle repeats!  Martin pessimistically suggested this this cycle is inevitable in the real world as well.  Of course, his talk had a lot more, so if you want to know the details of this and more, I recommend you watch it!
possible prisoner's dilemma payoff matrix, image from "beyond intractability"

Then, Hatmut Neven, of Google talked about the machine learning behind "Google Goggles," the reverse-image search where you take a picture of an object, send it to Google as a query, and Google will hopefully return some relevant results.  I hadn't thought much about this reverse-search problem, so a lot of the talk was interesting to me on a basic level.  For example, if you send a picture of a fancy chair, it's useless for Google to tell you it's a chair, you're probably looking to know what type of chair it is so you can buy it -- this makes the problem very hard.  But it doesn't mean Hatmut and his team haven't made a scary amount of progress: Hatmut says that Google can, given a picture of an arbitrary person who has about a dozen photos of them in Google's databases (i.e. online), correctly classify them in the top ten results.  Google hasn't activated this feature for privacy reasons, but if (or when?) Google does, this means that you could pretty much see someone on the street, take a picture of them, and find out who they are (and they don't even have to be a famous actress).  Hatmut thinks that in 10 years, we'll be able to carry a camera with us, which will, without any special querying, know more about the world around us than we do -- it will know what species of trees are around us, the origin of the name of the street we are on, the history of the author of the book we're reading, etc. Exciting stuff.  Finally, Hatmut talked about how quantum computing might help with all this, and he gave a much more positive endorsement of D-Wave than I'd read elsewhere.
D-Wave's 128-bit chip, image from Wikipedia

The last talk was by David Ferrucci about the IBM Watson system that beat the top two humans on Jeopardy (see my post on Watson).  He talked about the immense challenge of building Watson, the collaboration among many different types of researchers, the scientific problems involved, and the various problems his team had to overcome.  The task of building Watson was so monumental that many (non computer scientist) people couldn't get their heads around it.  David described an incident when he went on a radio show to talk about Watson and was asked by the radio host "So, you're going to give the computer all the questions and answers, and when a question is asked, the computer will pull up the right answer?"  David said, surely rolling his eyes, "No. We don't know any of the answers in advance, let alone the questions."  Then the radio show host responded "I don't understand. How are you going to do it then?" "Exactly!" said David.  I trust I'm not spoiling anything if I tell you machine learning was a big part of the solution.
Watson's avatar, image from Wikipedia

Neural Networks 

John Langford (who hosted my postdoc at Yahoo! Research) already blogged that neural networks were making a comeback, so to speak, at ICML 2011.  In addition to the sheer number of "deep learning" papers, lots of people excitedly talked about what great things they could train their neural nets to do.  I would add to John's post, after talking to Bill and Ilya, that another reason this area is becoming "hot" is that we finally have the computing power to train neural networks for real-world problems, something that was previously much more difficult.

I must admit that I don't know enough about neural networks, and the reason is that it's not easy to see how the fit into the larger world from a learning theory perspective.  For example, we know that a single-layer neural network cannot learn even the XOR function, that recurrent neural nets can simulate Turing Machines, and that certain classes of neural nets are not properly PAC learnable.  But from what I can see, the theory of neural nets is of a very different flavor than, say, the theory of decision trees.  If someone who knows these connections better wrote up a survey, I'm sure I'm not the only one who would be grateful. (Even better, maybe such a survey exists and I'm not aware of it.)

Papers I Liked

Given that ICML has 4 or 5 parallel tracks, I didn't get to see even 20% of the talks, but I tried to catch what I could at the poster sessions.  Here are a few non neural-networks (for the neural networks ones, see John's post) papers I liked, besides my own, of course.

"On the Necessity of Irrelevant Variables" by David Helmbold and Phil Long.  The main gist of this paper is that irrelevant variables don't hurt a predictor nearly as much as relevant variables help.  So if you're trying to do feature selection and aren't sure whether to add a feature or not, add it.

"Clustering Partially Observed Graphs via Covex Optimization" by Ali Jalali, Yudong Chen, Sujay Sanghavi, and Huan Xu.  Here, they do correlation clustering with partial observations, which is NP-Hard, so instead of the usual approach of making an approximation algorithm, they give an algorithm which succeeds of the data satisfies certain properties.  They're not the first to do this type of analysis, but I rather liked their type of data-dependent assumption.

"Optimal Distributed Online Prediction" by Ofer Dekel, Ran Gilad-Bachrach, Ohad Shamir, and Lin Xiao.  This paper considers online prediction, but in a distributed model, where one can get speedups, possibly at some cost in the regret.  They derive optimal regret bounds for convex losses and i.i.d. examples, and show that for many parameter settings,  you can actually get the same asymptotic regret as without parallelizing.

"Doubly Robust Policy Evaluation and Learning" by Miroslav Dudik, John Langford, and Lihong Li.  I'm a bit biased about this paper because I was at Yahoo! when they were woking on it. They basically show a technique for evaluating a new bandit policy on historical data, where you can get an unbiased evaluation as long as you have either a good model of rewards or a good model of the past policy. Hence, doubly robust.


ICML 2011 was great.  I'm looking forward to ICML 2012! One extra bonus is that ICML 2012 will be co-located in Edinburgh with COLT 2012, the main learning theory conference.

Update (7/26/11): Videos of all the talks from ICML 2011 have now been posted here.

If you're interested in machine learning, you should follow John Langford's blog,  Also, John will co-chair ICML 2012.