Friday, December 30, 2016

2016 is Finally Ending

2016 was, to say the least, a tumultuous year, marked by numerous conflicts across the world, the British exit from the E.U., an exhausting U.S. political campaign culminating in the election of The Donald, a sharp ascent of Putin's menacing role in the world, and too many other things to even list.  I mention this to note that I haven't missed any of these events, or their importance, but I'll skip most of these in this year's summary in lieu of some more personal, or at least scientific, happenings.  (I do occasionally tweet some political opinions, and if you want to see those, you should follow me there.)

So, in no particular order, here are some things I do want to note as 2016 comes to a close.
  • It's reassuring to remember that while it may not feel like it, the world continues to become a better place as a whole.  If you don't believe me, take a look at the data.  And even if you do believe me, read this book by Steven Pinker.
  • I have been developing my amateur interest in architectural photography.  Below is a recent photo of mine of UIC's University Hall, looming over its surroundings.  While not particularly pleasing to look at, I think it captures "socialist utopian" ideology of the brutalist architecture on display all over campus.
    University Hall, photo by me
  • It has been 25 years since Pretty Good Privacy (PGP) was developed.  Given the recent political happenings, I'd say it's a pretty good time to start using it for sensitive emails.  I installed Mailvelope, and here is my public key; its fingerprint is AC5E DCA0 76A1 F55A 4819 94A9 2FAC ADDD C766 7CB9.
  • Reports of the Russian government influencing our election highlight once again the importance of good security practices, which are horribly lacking throughout most of our companies and the government.  Until we fix this, we will continue to be at the mercy of foreign adversaries, hackers, and (mis)fortune.
    photo from glitch news
  • This year, I graduated my first Ph.D. student, Jeremy Kun.  Jeremy finished in 5 years, though he only started working with me at the end of his second year.  Before graduating, he had the option to do a postdoc in academia, work at Google, or join a startup, and he decided to go the startup route and is now at 21 Inc.  He wrote an interesting blog post about his journey through grad school that I recommend everyone considering math or cs theory grad school to read. (Full disclosure, I think he makes UIC MCS seem a rather nice place, which I agree with, but it's also in my interest to promote it as such to prospective students.)  I also expect to graduate some more students in 2017.
    Jeremy Kun defending his thesis
  • The Man Who Knew Infinity, a movie about Ramanujan, was released in the U.S. this year.  Even though the movie got some biographical details wrong, and even though I found some parts a bit annoying, the mathematical parts were quite accurate.  In particular, I think this movie, more than any other that I've seen, does a pretty decent job of showing to a general audience what it that mathematicians do all day.
  • Two years ago I predicted that a computer program will be able to beat the best human Go players by the year 2020.  AlphaGo reached this milestone this year, and while this technically met my prediction, the speed at which it arrived hasn't helped allay my fears of A.I. posing an existential risk to humanity.  Those of you who haven't given this issue much thought should watch Sam Harris's TED talk on this topic.  Also, I recommend watching Westworld, which I liked both as a show and for some of the nontrivial philosophy that it presents on this topic.
    computers become better than humans at one more thing, image from quartz
  • This year Elon Musk declared that the odds are a billion to 1 that we are living in a simulation. The argument goes like this: eventually we will become advanced enough to simulate worlds ourselves, and the simulated beings won't know they're being simulated (and perhaps eventually make their own simulations), and the number of simulated worlds will vastly outnumber real ones.  My prior doesn't allow for such odds, and I think there are quite a few hidden and probably false assumptions in his argument, but if he's right, it would reveal that we're fundamentally mathematical beings, and that would at the very least make Max Tegmark happy.

Tuesday, December 20, 2016

Counting Our Losses After the Election

After Trump's victory this election, I've seen a number of posts criticizing the "data scientists," all of whom predicted a Clinton victory.  If they all got it wrong, how can they claim to be engaging in science if they won't now change their methods?  And should they change their methods?

the electoral vote outcome as of 12/20/16, image from Wikipedia

I'm not a fan of the hordes of "data scientists" running regressions pretending they're doing "science."  In fact, I'm skeptical of any field doing actual science that has science in its name, computer science included. But I want to defend the some of the forecasts themselves and suggest a way of going forward.

(Also, while I wouldn't blame the pollsters, who have an increasingly hard job these days, one group I have no problem blaming are the political "scientists," who have all these theories about what candidates should and shouldn't do, where advertising helps and where it doesn't, and Trump did none of the things he was "supposed" to do and still won.)

Blame the forecasters?

I don't think there was an honest way to look at the polling, or really, most other publicly available data, and claim that Trump was actually more likely to win than Clinton. The truth is simply that unlikely events occasionally occur, and this was one of them.

While the forecasts all agreed Clinton is the favorite, they assigned her different win probabilities.  Sam Wang (whose forecast I repeatedly dismissed before the election) assigned something like a 99% chance to a Clinton victory.  Nate Silver predicted something like a 2/3 chance to a Clinton victory.  Does that mean that Nate is a better predictor?

Well, still not necessarily.  Unless someone assigned a 100% probability to a Clinton win, we can't know for sure.  Sam Wang could have been closer to the truth, but simply gotten unlucky.  Moreover, people should be rewarded for predicting close to 0% or 100% because those predictions are much more informative.  Nate Silver's prediction might have been well calibrated, but still quite useless.

Consider the following prediction.  I can predict that for the next 10 elections, the candidates of the two major parties have roughly a 50-50 chance of winning.  Since the Democrats and the Republicans roughly win half the time, I'll probably be well calibrated, but my prediction will remain useless.

Count your log-loss

So, ought we throw out hands up in the air and trust everyone equally next time?  No!  Statistics and machine learning have ways of evaluating precisely these things.  We can use something called a loss function (for reasons I won't go into here, I will use the "log-loss" function, but you can use others), where we assign penalties, or losses, for inaccurate predictions.  Whoever accumulates the least loss over time can be thought of as the better predictor.

The binary version of the log-loss function works as follows:
L(y,p) = -(y log(p) + (1-y)log(1-p))

So let y=1 in the event where Trump wins and p be the probability assigned to that event.  Someone assigning this event a probability of .01 will suffer loss = -(1*log(.01)+(1-1)log(1-.01)) = 2.  Whereas someone assigning this event a probability of .33 will suffer loss of approximately 0.5.  Note that had Trump lost, the losses would have been approximately .005 and .2, respectively, rewarding the confident prediction.

So, according to this metric, Sam Wang gets penalized a lot more than Nate Silver for predicting an event that didn't occur.  If he keeps doing this over time, he will be discovered to be a bad predictor.  Note that this function indeed assigns a loss of 0 for predicting a 100% probability to an event that occurs and infinite loss to assigning 0% to an event that occurs.  Big risks yield big rewards.  Also note that my scheme of assigning a 50-50 chance to each future election will simply yield a loss of about .3 each time, which shouldn't be too hard to beat.

So, I suggest we start keeping track of the cumulative log-losses of the various people in this game to keep them honest.

Thursday, December 31, 2015

An Eventful 2015

This post continues my tradition of reviewing the year.  This time my commentary is a bit longer than in previous years, so without further ado, here is my take on 2015:
  • This year, we again have two important computing anniversaries: the bicentenaries of Ada Lovelace and of George Boole.  Ada Lovelace, daughter of Lord Byron, was the world's first programmer -- her algorithm computed the Bernoulli numbers on the Babbage engine, a machine that had only existed on paper at the time.  Incidentally, my Ph.D. advisor, Dana Angluin, wrote a nice piece on Lovelace.  George Boole, of course, established the foundations of 0-1 valued Boolean algebra, which is essential for computer science.
    left: George Boole (photo from Wikipedia), right: Ada Lovelace (photo from Eventbrite)
  • This year also marks the 25th anniversary of the World Wide Web.  Congress decided to celebrate by ... passing CISPA.  Sigh.
  • The results the latest neural nets are producing are moving quickly from the realm of impressive to the realm of scary.  On that note, I think it's time for us to start seriously thinking about the potential dangers of AI.  If you believe there is nothing special about our wetware, and that we'll keep making scientific progress, it's pretty straightforward to reach the conclusion that we will likely end up building computers that are in every way smarter and more creative than humans.  Once this happens, the cat will be out of the bag, so to speak, and the only question, really, is when will this happen?  This isn't necessarily a bad thing, but it is something we'll probably have only one chance to get right.
A "reimagined" version of "The Scream" by Google Brain scares me more than the original.
  • On a related note, a few high-tech luminaries founded OpenAI, a new AI research center with a billion dollar (yes, you read that right!) endowment.  Just think what impact the 60 million dollar Simons centers have had, and you'll quickly see how big a deal this could be if the money is spent well.  While I think the people involved are fantastic (congrats to Ilya!), I'm surprised by the center's initial narrow focus on deep learning.  As Seb pointed out, we will need entirely new mathematical frameworks to come close to "solving AI," so such a narrow focus seems shortsighted.  Luckily, the folks at OpenAI have plenty of resources for any future course-corrections.
    Sam Altman and Elok Musk, two of the founders of OpenAI, photo from medium/backchannel
  • Assuming the result checks out, the field of theoretical computer science has had a rare breakthrough.  Namely, Laci Babai gave a quasipolynomial-time algorithm for the graph isomorphism (GI) problem.  I was lucky enough to attend the lecture where the result was announced,  My grad student, Jeremy Kun, wrote up what is probably the best account of the lecture that still serves as a nice introduction to the paper, which is now up on arXiv.  One interesting thing about this result is that it neither makes real practical progress on GI (we already have fast heuristics), nor does it change our view of complexity (since the GI result is "in the direction" we expected).  It's just that GI is a such long-standing hard problem that progress on it is a very big deal.    
Babai, presenting his proof.  Photo by Jeremy Kun.
  • In combinatorics, Terry Tao, building on a polymath online collaboration, solved the Erdos Discrepancy problem, by proving the conjecture true.  I like that these large online collaborations have now led to solutions, or helped lead to solutions, to multiple important open problems.
  • Paul Erdos and Terry Tao, photo from Wikipedia
  • This isn't primarily about computer science or math, but if you haven't heard of CRISPR, go read about it.  I don't think it's an exaggeration to say that CRISPR, or its offshoots, are probably going to change our world rather fast.  While this technique was discovered a couple years ago, it was named "breakthrough of the year" for 2015 by Science magazine.

Monday, December 21, 2015

Why Does the World Care about Our Math?

One day while working on bandit problems at Yahoo!, I had this strange realization that its search engine, nay the entire world, seems have a particular remarkable property: we can sit around doing math, and as a result, better advertisements will get served to users.  Of course, this applies not just to computational advertising, but to pretty much anything -- we can know where the planets will be, when certain epidemics will spread, how fast planes need to fly to stay airborne, and a plethora of other things just by thinking abstractly and solving some equations.

I immediately and eagerly shared my newfound realization with others, and it impressed absolutely nobody.  I was told "How else would the world work?" and "There is lots of math that's not useful, but we choose to work on and formalize the things are are relevant to the real world."  These are, of course, perfectly good objections, and I couldn't explain why I found my realization at all remarkable, but I'd had a nagging feeling that I was onto something.

Forward 6 years, and I'm at Market Fresh Books, a bookstore near UIC.  As an aside, this bookstore is really interesting -- it sells used books by the pound or for small flat fees. I even once picked up a copy of baby Rudin for just 99¢ (plus tax) to add to my library.  Anyhow, I stumbled upon a copy of "Disturbing the Universe," Freeman Dyson's autobiography from 1979, and it looked interesting enough to buy.  That evening, while reading it, I came upon the following passage by Dyson:
"Here was I ... doing the most elaborate and sophisticated calculations to figure out how an electron should behave.  And here was the electron ... knowing quite well how to behave without waiting for the result of my calculation.  How could one seriously believe that the electron really cared about my calculation one way or the other?  And yet the experiments ... showed it did care.  Somehow or other, all this complicated mathematics that I was scribbling established rules that the electron ... was bound to follow.  We know that this is so.  Why it is so, why the electron pays attention to our mathematics, is a mystery that even Einstein could not fathom."
I still don't know the answer, and I can't even state the question without it seeming silly, but at least I now know I'm in good company.

Freeman Dyson
image credit,

Wednesday, December 31, 2014

2014 in Review

As 2014 comes to an end, I decided I want to continue my newfound tradition of summarizing my thoughts in a "year in review" post.  So here are some thoughts on academia, machine learning, and theory, again in no particular order.
  • Every company seems to have its own superstar leading a neural nets effort.  And deep learning keeps making impressive advances.  My hope that a nicer learning theory gets developed around this topic.
  • Computer science enrollments continue to soar, and the "sea change" may be here to stay. It's becoming a better and better time to study computer science.
  • On the other hand, research labs have continued to be vulnerable.  Perhaps we'll see a reverse-trend, with academic jobs temporarily making up for losses in the research job market.
  • It's an interesting time for online education, which has had some setbacks recently.  Yet, it seems even Yale would rather stream Harvard CS50 than hire enough faculty to teach its introductory computer science course.
  • With the release of The Imitation Game, more people than ever will know about Alan Turing.  But will their impressions be accurate?
  • My favorite "popular" AI article this year was on computer Go.  My guess is that by 2020, computers will be able to beat the best humans.
  • After teaching a learning theory course last semester, next semester I'll be teaching a graduate-level "Foundations of Data Science" course, loosely following Hopcroft and Kannan's new book.  I'll have to make some tough choices about what to material include and what to skip.  Any thoughts?

Thursday, July 31, 2014

On AAAI 2014

I've just returned from AAAI 2014, which was held in Quebec City, Canada. It was my first time submitting to, attending, or presenting at a AAAI, and I was very pleasantly surprised on many levels -- the strength of the program, the breadth of research topics covered, and the conference organization.  I also really enjoyed spending some time in Quebec City, which was a wonderful location.

a photo of Quebec City that I took from the conference hotel


AAAI is quite a large conference with around a thousand attendees and a couple hundred presenters, so it involves multiple parallel tracks, posters, etc.  Carla Brodley and Peter Stone, the co-chairs, did a wonderful job organizing everything to work smoothly.  Moreover, they introduced two innovations that I particularly liked: 
  1. Authors whose papers were selected for talks were able to choose to give a 2-minute plenary talk instead of a regular session talk.  This worked really well for papers that carried a strong conceptual message, which might appeal to a wider audience. (I chose this option: you can see my slides here.)  I was never bored at the plenary sessions -- if I found one of these talks or topics uninteresting, it was over in two minutes.  On the other hand, if I liked a talk, I could find the presenter later at the poster session (see next innovation).  The slides for these talks were all pre-loaded on one machine, and Carla and Peter somehow got everyone to stick to just two minutes, so, much to my surprise, these sessions went very smoothly.
  2. Authors who didn't give talks and authors who chose the 2-minute plenary talk option had scheduled 45 minute slots to "present" their posters during one of the multiple poster sessions.  At any given time during a poster session only about 30 posters were officially being presented (you could also show people your poster and work at any time). This was great because I didn't feel like I had to stand by my poster for hours.  Moreover, during my 45 minute presentation, I had a much larger audience.  I really liked both innovations and hope they are incorporated into future AAAIs (and ICMLs, NIPSs, etc.). 
I also want to discuss some of the content that I enjoyed.  I should note that I left the conference day early, attended a somewhat arbitrary selection of sessions, and pseudorandomly meandered through the poster sessions, so what follows is not nearly complete.  I know for sure that I am leaving out many good talks and papers.

Plenary Talks

I only saw two of the invited talks, but both were really great.

image from mashable

In the first talk, Adam Cheyer, one of the main inventors of the iPhone "digital assistant" Siri, discussed the history of the project, as well as some of the technology behind it.  One thing I learned from the talk was how much Siri is able to do that I didn't know about.  For instance, if you're reading an email and want to call the email's sender, all you need to do is say something like "Siri, call him" if the contact info is in your phone.  If you think about it, this is incredibly easier than having to leave email, go to phone contacts, search, etc.  In fact, Adam pointed out that while your hands are the best interface for getting at items on a screen, your voice is better for accessing content you can't see.  If one uses Siri with this in mind, it can be much more effective.  Adam also pointed out how immensely challenging building the system was.  For instance, a user who wants two tickets for the movie "Nine" at "8pm" can say: "two for nine at eight" and Siri has to figure out what to do after hearing what seems like 4 different numbers.  Just thinking about the difficulty of this problem should impress us how for AI has come!

As for Siri's history, the entire recounting was interesting, but one facet stuck out the most:  Immediately after the iPhone was announced in 2007, Adam and two of his colleagues met to discuss how to take advantage of the new technology.  Just three months later, they were making a pitch to VCs to fund their new idea.  By 2010 Siri was an app, and by 2011 it was bought out and integrated into iOS, making billions of dollars for Apple and impacting millions of users.  Adam clearly had a lot of experience working on these types of systems and in launching new and exciting products (CALO, IRIS, etc.), but it amazed me how much one can accomplish in a short time if one is talented, experienced, and motivated enough.

an example UI from the coloring game described below, taken from this paper

The second invited talk was by Michael Kearns, who is known for a lot of things including fundamental contributions to learning theory.  But in this talk, he discussed some of the user studies in network science research he and his group at UPenn have been doing.  In these experiments, subjects come to a room and sit at computers and perform various tasks like distributedly coloring a graph, coming to consensus, etc.  In their typical setup, players play on an underlying graph (whose topology they don't know), with players controlling one vertex each and only seeing what their neighbors are doing.  

In a coloring game, the players have to choose one of a few colors and will only get paid if their final color is different from the colors chosen by all their neighbors.  They can asynchronously switch their own color as much as they want, and the goal is to see how long it takes various graphs to get colored.  In a consensus game, the goal is for the entire graph to come to consensus, so the players need to choose colors that are the same as their neighbors' choices.  From the computational complexity standpoint, coloring is NP-hard (in the worst case) and consensus is trivial, but Kearns's group observed that the hardness of performing these tasks in the real world depends on the interplay of the graph structure and the task.  For instance, on many graphs, consensus appears much harder than coloring.

The result I found the most interesting pertained to an experiment they ran on coming to consensus in a game where different nodes have different preferences.  The way this setting works is that the players only get paid if at the end of a set amount of time, all have agreed on a single color.  However, different players will be paid different amounts depending on which color they agree on.  (This was made to be a simplified model the Obama vs. Hillary primary, where democrats had strong reasons to come to agreement, but had different preferences about whom to agree on.)  Kearns's group showed that in such a game, a small minority of well connected players with one set of preferences in all their trials override a much larger set of worse connected players with different preferences, even though the minority in question had no information about the structure of the graph (or even any knowledge that they were more connected than others).

This seems an interesting line of research and could hope to explain various real-world phenomena.


Finally, I also enjoyed quite a few of the papers at the conference.  One thing that surprised me was the large number of machine learning and also game theory papers.  The learning / game theory papers were on average more application driven than what one usually finds at ICML/NIPS or EC, but it seems that these fields are becoming increasingly incorporated into AAAI.

So, here's a very incomplete list of some papers I found interesting:

  • Active Learning with Model Selection by Alnur Ali, Rich Caruana, and Ashish Kapoor -- this paper considers active learning for  simultaneously choosing the best model and its parameters.  This is done by training the model on a training set and evaluating its performance on a validation set.  Both sets require labeled data.  So the active learning algorithm, in sampling a label for a new point, must also consider whether to add the point into the training set or into the unbiased validation set.  The resulting performance of their algorithm shows that it roughly first "explores" by placing into valuation and then "explores" by placing into training.  I found this idea and interpretation pretty neat.
  • Fixing a Balanced Knockout Tournament by Haris Aziz, Serge Gaspers, Simon Mackenzie, Nicholas Mattei, Paul Stursberg, and Toby Walsh -- this paper considers a natural, previously open, question.  Given you know the probabilities with which each team will beat each other team, can you design a knockout tournament (balanced binary tree) that maximizes the probability of a given winning?  This turns out to be NP-hard, but they also give some positive results in certain cases.
  • Preference Elicitation and Interview Minimization in Stable Matchings by Joanna Drummond and Craig Boutilier -- consider medical residents being matched to schools.  For residents and schools to choose their rankings, they need to interview with all the schools.  This is really costly.  This paper assumes that when one of two options is much better than the other, an interview probably doesn't need to be scheduled just to tell them apart.  Then, given their modeling assumptions, they try to find stable matchings without having to do too many expensive "interviews."  What I like about this paper is that it considers the sort of practical concerns that give a new twist to what seems like an already solved problem.
  • Simultaneous Cake Cutting by Eric Balkanski, Simina Brânzei, David Kurokawa, and Ariel D. Procaccia -- this paper looks at "cake cutting" in the simultaneous model, where user preferences are elicited simultaneously.  Cake cutting problems involve dividing a resource, say a yummy continuous cake, such that everyone is happy with the division (or at least thinks the division fair under various notions).  They show that in a simultaneous model, one can satisfy "proportionality" but not "envy-freeness", and also give some approximation results.  I don't know exactly why I like this model, but I have a soft spot for cake cutting.
  • Bagging by Design (on the Suboptimality of Bagging) by Periklis Papakonstantinou, Jia Xu, and Zhu Cao -- this paper gives a subsampling method that outperforms bagging, in the case, roughly, when bagging works, e.g. by being more stable to noise than a single predictor.  Their method comes from the observation that bagging is done over subsampling a fixed dataset, which makes bagging behave differently than if it had access to an unlimited supply of independent samples.
  • Recovering from Selection Bias in Causal and Statistical Inference by Elias Bareinboim, Jin Tian, and Judea Pearl -- this paper won the best paper award.  It gives complete conditions on when one can recover from selection biased data. They also consider settings where some unbiased data is available and other extensions.  I plan on reading it more carefully, as the results seem important.
In sum, I enjoyed AAAI and look forward to attending it again in future years.

Tuesday, July 01, 2014

Should You Get a Ph.D.?

I am occasionally asked by students whether they should go to grad school to get a Ph.D.  So much has already been written on this subject that I hadn't felt it necessary to put my opinion in writing, nor did I have any concrete advice.  But reading Duncan Watts's excellent recent article on the subject crystalized my thoughts.

So here, succinctly, is my advice:
Enroll in a Ph.D. program only if you think you'll enjoy graduate school.  Don't enroll just because you think a Ph.D. will help you in your future career.  In other words, don't go to graduate school as a means to an end.
Note: my advice is meant for US students seeking a Ph.D. in the sciences.  It probably applies more broadly.

Yale's computer science building, Arthur K. Watson Hall, where I spent a lot of time in grad school.  Image from here.

My advice may seem obvious, but it's not.  For instance, this would be bad advice for students considering an M.D. -- almost nobody enjoys medical school, but I think few graduates regret having gone.

But there are three main reasons why this is the right advice for grad school.
  1. In Ph.D. programs, unlike in many of the professional schools, you're supposed to be doing pretty much what your professors do -- research.  Sure you have to take some classes and get paid less, but chances are if you don't enjoy grad school, you won't enjoy a research career either.  (Going to grad school to pursue a finance career or something unrelated later is doing it the hard way.)
  2. Jobs in research, and especially in academia, are scarce, and going to grad school just because you might like a research job later is a very unsafe bet.  So, going to grad school only makes sense if you'd enjoy the research career, and also if you'd enjoy the experience anyway if a research career doesn't work out.  Four to six years of your life is a long time to be miserable.
  3. Grad school, unlike professional school, has no end-date.  You finish when you've done enough.  And it's hard to do enough if you're not enjoying the work.  Forcing yourself to sit through anatomy is one thing; forcing yourself to be creative is quite another.
Also, notice that my advice is an "only if."  So even if you might enjoy grad school, you should explore other possibilities.  Research is incredibly rewarding, but also has downsides which need to be considered.

That being said, I don't mean to be negative.  I'm very happy I went to grad school -- I had a great time there and am lucky to keep on getting to do research.  So, if you want to get a Ph.D., go for it.  Just remember to enjoy the journey.

Tuesday, December 24, 2013

My 2013 Post

The end of 2013 is approaching, and I realized I hadn't posted a single time this year. Starting a tenure-track job and having a baby apparently leave little time for blogging!  However, I do want to keep my blog somewhat active, so I figured I better make at least one post for this year.

But what to write about? It's hard to choose just one topic to make a "post of the year".  So, I'll just write about some work-related highlights of 2013, big and small, in no particular order:
  • The newly founded Simons Institute for the Theory of Computing seems to have had a great year. I haven't had a chance to visit, but I have watched some videos of their excellent talks online! And I've been keeping up with Moritz Hardt's interesting new blog.
  • I've for the first time gotten involved in running a conference, or rather part of it: ISAIM 2014 will feature a session on the "Theory of Machine Learning," and I'm quite happy with the program. (It's not too late to attend!)
  • My friend Nikhil Srivastava (who appeared on this blog before), together with Adam Marcus and Dan Spielman, proved the Kadison-Singer conjecture.  It's a result that touches on many areas of mathematics -- its resolution a big deal, and especially exciting it came from the computer science community.
  • With the opening of SCDA, Facebook Research Labs, MSR-NYC, and others, machine learning seems to be quite in demand, both in academia and industry. (And, not quite unrelatedly, deep learning is continuing to make a comeback.) It's a great time to be a machine learning researcher or practitioner.
  • Near the beginning of the year, my co-authors and I finished and published a paper extending statistical queries to optimization algorithms and proved a lower bound for planted clique. I'm particularly happy about this result, which had been in the works for a while. 
  • I published my first paper with students at UIC. One of them, Jeremy Kun, who is working with me, has a great blog, which you should all read!  This also happens to be my first work primarily in game theory, which seems to be an area many learning researchers delve into. 
  • While last year was 100 years since Alan Turing's birth, this year was Erdős's centenary.  One could say that Erdős was the Turing of combinatorics, and he had quite a bit of impact in computer science as well.
  • It looks like the NSA hasn't made any huge mathematical breakthroughs, but that hasn't kept them from breaking crypto-systems in the real world, both in ways we'd expect and in ways we wouldn't.
  • Leslie Valiant published Probably Approximately Correct. I'm sold, but can it bring learning theory to the masses?
I'm sure I missed some important things, but that's all I have for now.  I resolve to post more next year. It's a modest, but achievable, goal.  To a happy and productive 2014!

Friday, December 28, 2012

Turing's Revolution

In my post Alan Turing Year, I discussed how it's important for people to be aware of Turing's intellectual contributions to math and computer science, as well as his war efforts and tragic death.  It's time to live up to my own call and do my part.  In this post I'll talk about Turing's impact, from his best known contributions to his lesser known, but important ones.

I should note that the content of this post is largely stolen from answers to a question I asked on the StackExchange (SE) cstheory site, where I sought to gather a compendium of Turing's less known results.  The answers made a collection of impressive contributions, some of which I wasn't familiar with.  Some of his contributions turned out to be surprisingly modern.  They also introduced us to viewing the world through an algorithmic lens.

Without further ado, here they are:
1. Defining Turing Machines and Showing the Existence of UTMs (1936): A Turing machine is a universal computing device that forms the basis of how we think about computing.  A Universal Turing machine is a fixed machine that can simulate any other Turing machine -- most relevant to our current computers. (SE link)
a Lego Turing Machine, photo from wikipedia 
2. The Church-Turing Thesis (1939): The idea that the Turing Machine (and Lambda calculus) captures all of computation, and is therefore worth studying.  This hypothesis cannot be proven, but everyone believes it nonetheless. (wiki link
3. Solving the Halting problem (1936): A version of this was posed by Hilbert.  Turing showed that telling whether a program will Halt is undecidable.  (wiki link
4. The Turing Test and Introducing AI (1950): Turing gave an objective test that decides whether a computer exhibits intelligence.  The idea is to say that a computer is intelligent if it can fool humans into thinking it is also human.  Accepting this idea is a way to avoid many philosophical arguments about the nature of intelligence. (wiki link
5. Introducing Oracles and Relativization into Computability Theory (1939): Oracles are incredibly useful for all sorts of analysis -- for example the famous Baker-Gill-Solovay result that says P vs NP cannot be separated any technique that relativizes (holds for relative to any oracle), eg diagnolization.  It was part of the paper "Computing machinery and intelligence" that launched AI. (SE link
6. The Bombe (1950): Also called the Turing-Welchman Bombe, for breaking the German enigma machine in WW2 and introducing tools to cryptanalysis. (SE link)  
 the Bombe, from
7. Randomized Algorithms (1950): Turing observed the potential advantage an algorithm can gain from having access to randomness. (SE link)
8. The First Artificial Neural Network (1948): This idea is an important one in AI and machine learning.  Turing's design predated Rosenblatt's preceptron. (SE link
9. The Chemical Basis of Morphogenisis (1952): In his most cited paper, he began to tackle the question of how a symmetric embryo develops into an assymetric organism using symmetry perserving chemical reactions.  This started to introduce algorithmic thinking into biology. (SE link
10. Good-Turing Estimator (1953): For estimating an unseen fraction on a population when given past data. (SE link
11. One of the First Chess Algorithms (1952): Before we had computers, we had paper chess algorithms. (SE link
  Deep Blue, photo form wikipedia
12. Checking a Large Routine (1949): The idea of using invariants to prove properties of programs and "well-foundedness" to prove termination. (SE link
13. LU-Decomposition (1947): A method for factorizing a matrix as the product of a lower triangular matrix and an upper triangular matrix.  Now commonly studied in Linear Algebra. (SE link)
14. Turing's Method for Computational Analysis of a Function (1943): Invented for studying the of the Riemann zeta-function but is more widely applied. (SE link)

I'm sure there are more.  Any one of these is an impressive achievement for a scientist -- the first couple are revolutionary.  That Turing did all of these and more shows his remarkable contributions to our field.

Update (12/31/12): Alan Richmond points out another important work of Turing's, which I added as # 14.  Also, my wording for the Turing test proved a bit controversial, so I've changed it to better reflect reality.

Friday, December 07, 2012

UIC's MCS Program Seeking Graduate Students

As I'd posted before, UIC has two departments interested in computer science.  We have a computer science department in the engineering school.  And the math department in the school of liberal arts and sciences, where I have an appointment, has a smaller program in "mathematical computer science" (MCS).

The MCS program is more focused on the mathematical aspects of computing and less on systems, databases, etc.  It is similar to the ACO programs at CMU and Georgia Tech.

We recently updated our website, where you can learn more about the program and the affiliated faculty.  More importantly, if you're a student looking to get a Ph.D. in a mathematical area of computing, consider applying to our program -- we're looking for strong applicants.

a screenshot of the new MCS website

Tuesday, August 07, 2012

Diamond Forever

Recently, the academic community started to fight back against traditional academic publishers, who earn large profits without any longer providing many clear services.  The internet has been a great tool for taking care of such inefficiencies, and it's quickly disrupting the publishing market as well.

Having thought a bit about these issues myself, I wanted to summarize my views of some of the various publishing models (as described by Tim Gowers) under consideration, as well as the pros and cons of each.  It's possible, even likely, that I am missing something, so feel free to point it out in the comments.

The Systems


screenshot, after I tried to access a paper from

system: Researchers send manuscripts to publishers. Publishers have volunteer editorial boards consisting of researchers who decide whether to accept/reject manuscripts. Accepted manuscripts are published but cannot be accessed without a paid subscription to the given journal. Authors lose most (or all) rights to their own work.

who wins: publishers, by charging for access

who loses: university libraries and therefore scientists who pay for it with overhead; the public, who doesn't get access to the research their taxes likely funded; authors, who cannot easily have their research read and used.

opinion: This system made some sense before the internet era, where there were few other good ways to disseminate research. The publishers used to have to work hard for their profits by doing serious formatting/printing/publishing work. With the internet, all of this is now done by the authors, and the closed-access system no longer makes sense.


system: Same as closed-access, except that authors are allowed to put their accepted manuscripts on their websites and onto online repositories such as the arXiv.

who wins: publishers, by charging for access; researchers who put their papers online

who loses: basically same as closed-access, except for the authors who put their papers online

opinion: This is the system most journals are now on, or at least they usually don't go after scientists who keep their papers online. This is strictly better than closed-access, but doesn't go far enough. Too many scientists do not put their papers on their websites or arXiv. There is still no guarantee that science will be public, as it should be.


a screenshot of the new gold-access math journal

system: Researchers send manuscripts to publishers. Publishers have volunteer editorial boards consisting of researchers who decide whether to accept/reject manuscripts. Authors of accepted manuscripts pay to have their work published but published manuscripts are freely accessible to everyone online.

who wins: the public, who can access research for free. the publishers, who still get paid (but not as much)

who loses: the authors, who have to pay (usually large sums) to have their work published. universities, which might need to subsidize the fees. quality (see below).

opinion: I think adopting this system would be a mistake. First, scientists are the ones doing the work, the editing, and the main contributions -- it seems unfair to make them pay as well. Researchers without grants or university support would have to use their personal funds. Moreover, I think this would seriously impact paper quality.  In this system, journals would make money by accepting more papers as opposed to getting more readers, and some authors may have incentives to pad their CVs. I cannot see how this wouldn't go wrong.


the journal of machine learning research, image from their website

system: Researchers send manuscripts to publishers. Publishers have volunteer editorial boards consisting of researchers who decide whether to accept/reject manuscripts. Authors of accepted manuscripts have their work published for free and published manuscripts are freely accessible to everyone online.

who wins: scientists, who can publish for free and have their papers read. the public, who can access research freely. universities, who do not have to pay for journal access. research quality.

who loses: publishers, who cannot any longer charge for anything

opinion: This is the best of all worlds. Everyone seems to win, and research quality especially wins because nobody makes a profit from publishing bad results.  The only problem is that the publishers lose, and why would anyone run a journal without making a profit?

A Solution

Diamond-access is the best system for everyone, except for the publishers.  However, publishers need incentives to run a journal, which they wouldn't have in a diamond access system.  Now, becoming a publisher would involve operating at a loss.  So who should take the hit?

One obvious solution is for universities to publish (or pay for publishing) journals.  They would make a sacrifice in maintaining the journals for free, but what would they win? Quite a lot actually.  Were we do switch to this system, university libraries would no longer have to pay subscription fees to journals, and the university professors wouldn't have to pay to have their work published.  And the quality of publications wouldn't suffer.   Some of this cost could even be handled by overhead.

Yale's library (from wiki), where I not once went to do research during my Ph.D.

But what is the cost of running a diamond access publication?  Well, it turns out to be very small.  With pdfs online, there's really no need to publish paper proceedings.  I don't think I've ever gone to a library to find an article on paper -- pretty much nobody does these days.  So, all you need to do is register a domain and pay for some maintenance and hosting.  The rest of the work is done by volunteers.  Looking at the financial statement (of which I learned from Google+) of the Journal of Computational Geometry, it could take as little as $10 per year.  Heck, we don't really need universities to pay for it.

Yes, it really costs very little to run a journal these days if you do it right.  There's no need for readers, or authors, to pay huge sums to middle-men.

Monday, May 07, 2012

Atlanta, Northward

When you accept a postdoctoral position, you know you probably can't stay indefinitely.  Psychologically, it means you have to prepare yourself for having to switch jobs, coworkers, offices, etc., and I, as a postdoc, have been ready for this foreseeable consequence with Georgia Tech, where I've enjoyed being part of a strong and lively department the last two years.

Unfortunately, when you take a postdoc in a city like Atlanta, where there isn't as high a concentration of universities and research labs as some other places, you not only have to leave your job, but likely also the city.  And even though it was in the back of my mind that I'd probably have to move, I didn't expect to love Atlanta as much as I have come to.
Atlanta's botanical garden

Atlanta has sunny, warm weather, really friendly people, lots of trees, amazing restaurants, nice parks, museums, a symphony, theaters, and everything you could want in a city (except, perhaps, a waterfront).  It's also very livable -- the population density is fairly low, so you're not elbow-to-elbow every time you leave your house, and you can live in really nice suburbs right in the middle of the city.  Because of the large influx of new residents, it's mostly new and clean, yet the cost of living is quite low.  And you can reach great hiking with a 20 minute drive and reach a beautiful national park in 3 hours.  It's certainly a place I wouldn't mind living in again at some point in my life.
Smoky Mountains National Park

But while I'm sad to have to leave Atlanta, but I am also excited to begin a new phase in my career.  In a couple months, I'll be joining the University of Illinois at Chicago (UIC) as an Assistant Professor in the Math department, actually called "Mathematics, Statistics, and Computer Science."  The department is quite strong, but unlike most math departments, UIC's is expressly concerned with Computer Science (so, yes, I'm remaining a computer scientist). Additionally, UIC has another entire standard Computer Science department, where I'll also have a courtesy appointment.  Between the two departments, there are quite a few great faculty members working on various aspects of machine learning -- I'm looking forward to joining them. And with Northwestern, UChicago, and TTI-C nearby, Chicago should be quite an exciting place to work.
Chicago, reflected in "the bean"

But because this post started with Atlanta, I should say something about the city of Chicago.  While I don't yet know Chicago well, it also looks like a beautiful, livable place, on the shores of a (literally) great lake.  The people seem friendly, and there is a lot to do.  I just hope that, after being spoiled by Atlanta, I can get used to Chicago's cold winters.

Friday, April 27, 2012

Computing Conceptions

From Georgia Tech, many of us have been closely and concernedly watching our southern neighbors at the University of Florida, where budget cuts and a seemingly hostile dean conspired in an attempt to decimate their well-respected computer science department.  Dismantling the department, at a time when computer science research is so critical, would be a laughably bad decision on the part of the university.

I don't claim to fully understand the rationale behind this plan.  However, I have a feeling that part of the reason such an idea was even being seriously considered has to do with a couple misconceptions of what computer science research is, and I fear that such misconceptions extend beyond the state of Florida.  And I don't just mean that the average person doesn't understand computer science; that's to be expected. I mean that many academics, even scientists, don't understand the basics of what computer science is about and therefore tend to devalue it, especially as an academic discipline.

First, many people seem to assume that computer science is just programming or fixing computers.  As a graduate student at Yale, where computer science is a small department, I was often asked by other Ph.D. students why computer science even has a Ph.D. program.  They didn't view it as an academic pursuit, but more as a trade skill.  I fear that many scientists view computer science as limited to programming or getting computers to work, probably because that's the way most, say, physicists use computers.  They have little understanding of the beautiful, deep results and insights that computer science has brought the world.  Viewing it as an instrumental non-academic field, people think it would be okay to kill-off computer science research and leave the professors to teach programming (which, admittedly, is an important part of a computer science department's job).

(clip art from here)
something computer scientists do not normally do

The other, very related, misconception, one that was clearly in play at the University of Florida, is that the computer science department was somewhat redundant because the electrical and computer engineering department already has the word "computer" in it.  Their reasoning sounded more sophisticated than that, but only superficially.  But computer science and electrical engineering are very far in their central concerns.  Computer science, for the most part, is divorced from concerns about electricity, physical media, or anything of that sort.  Whether you work on operating systems, machine learning, or the theory of computation, you mostly don't really care about the underlying hardware, whereas electrical engineers do.  Greg Kuperberg, writing on Scott Aaronson's great blog post on this issue, puts it better than I could:
"Apparently from (Florida engineering dean) Abernathy’s Stanford interview, and from her actions, she simply takes computer science to be a special case of electrical engineering. Ultimately, it’s a rejection of the fundamental concept of Turing universality. In this world view, there is no such thing as an abstract computer, or at best who really cares if there is one; all that really exists is electronic devices.
[...] Yes, in practice modern computers are electronic. However, if someone does research in compilers, much less CS theory, then really nothing at all is said about electricity. To most people in computer science, it’s completely peripheral that computers are electronic. Nor is this just a matter of theoretical vs applied computer science. CS theory may be theoretical, but compiler research isn’t, much less other topics such as user interfaces or digital libraries. Abernathy herself works in materials engineering and has a PhD from Stanford. I’m left wondering at what point she failed to understand, or began to misunderstand or dismiss, the abstract concept of a computer."
(image from here)
something usually not of research interest to computer scientists 

It looks like a disaster in Florida has so far been avoided. And with each passing year, more scientists will have, at the very least, taken some basic computer science in college -- it is part of our job to teach the important concepts in our introductory courses.  I'm hoping this will improve the perceptions of our field.  But in the meanwhile, it has become apparent that we have much more PR to do to.

(image from here)
now we're talking!

Sunday, January 22, 2012

Alan Turing Year

I am excited that 2012, the centenary of Alan Turing’s birth, will see many celebrations of his life and legacy.  It is hard to think of a scientist who has had more impact both on science and on our lives, but who still remains as unknown to the public, so I am glad Turing will be getting some much-deserved attention this year.

Alan Turing, photo copyright held by National Portrait Gallery in London

Alan Turing laid out the foundations of the study of computer science (in the process of solving Hilbert’s Entscheidungsproblem – no small feat on its own), contributed fundamentally to our understanding computation in nature, envisioned the future of intelligent machines, and saved millions of lives by helping shorten the length of World War 2.

Yet upon discovering Turing was gay (and convicting him for "indecency"), the British government took away his job, kept many of his contributions secret, and chemically castrated him, driving him to suicide.

After some long-overdue outrage, Gordon Brown issued an apology on Britain's behalf, albeit decades too late.  Now there are new petitions, asking for the British Government to grant Turing a pardon.  I have nothing against these efforts, but how Turing was treated is to our collective shame, and we should remember that these efforts are to make us, not him, feel better.  Turing knew he wasn't doing anything wrong and never needed condescending “pardoning” from politicians, nor does he need them any more after his death.  If the pardon is to send an apology to the homosexual community for their mistreatment, I’m sure the British government can think of something a little more direct.

I think a better way of honoring Turing is to make sure people know about his work – like we do of KeplerGalileoNewtonDarwinEinstein, and many of the other great scientists who revolutionized our world-views. Unfortunately there’s a lot of work to be done: even well-meaning articles trying to popularize Turing’s work mainly describe him as an accomplished code-breaker and World War 2 hero.  Turing did break codes, most notably the German Enigma machine, but calling Turing a code-breaker without mentioning his scientific impact is akin to calling Isaac Newton a treasurer, given his years as England's Master of the Mint.

So, this year, we academics will celebrate Turing’s great accomplishments by holding conferencesawards, and lectures in his honor.  There will be documentaries released about his life.  I’m sure an op-ed or two will be written in the newspapers.  But I also hope that reporters, lecturers, and especially teachers, will help the the public at large learn about this pioneer of the information age.

Thursday, October 27, 2011

ALT 2011

Earlier this month, I traveled to Finland, where I attended ALT 2011 and presented a paper on learning sparse parity functions in the presence of noise. 

a picture I took in Helsinki

COLT (Conference on Learning Theory) and ALT (Algorithmic Learning Theory) are the two main learning theory conferences.  Though COLT is both the larger and better known of the two, it seems that ALT is growing stronger with every passing year, and this year was no exception.  Not only was the program good, but Sébastien Bubeck gave a timely tutorial on bandits, which deservedly got their own session this year. (I, unfortunately, arrived too late to see the tutorial, but I read the nice slides afterwards.)

I wouldn't be the first to note that the other thing that differentiates ALT from its big brother is that ALT seems to have a broader definition of learning theory, and the conference is therefore more diverse in its topics than a typical COLT.  This has clear benefits, but it also means that there are entire sessions where I am pretty lost.  Of course, as I've previously noted, not feeling like you have to following everything could also be a benefit when you're attending a conference.

I want to mention one paper I particularly enjoyed, especially because of the model, which illustrates the importance of unlabeled data in the agnostic setting:
"Learning a Classifier when the Labeling is Known" by Shai Ben-David and Shalev Ben-David.  This paper works in an agnostic learning model where the learner already knows the labeling function, but wants to find a good succinct representation (i.e. proper learning).  How good a particular representation is depends on the distribution the data is coming from, and this is what the algorithm needs to learn by seeing examples.  It turns out that this, more generous, model is subject to essentially the same sample complexity lower bound as the common agnostic PAC model.

Monday, August 22, 2011

Happenings on CSTheory Q&A

In addition to machine learning, one of my main research interests is theoretical computer science.  Following the success of MathOverflow and other Q&A sites, ("cstheory" for short) was born about a year ago, and I've been an active participant since its inception.  In the last year, we've worked hard to make cstheory a valuable resource and to promote it.  I'd say the efforts paid off -- the site is doing quite well, with over 5000 members.

Moreover, cstheory recently added a blog (Aaron Sterling and Joe Fitzsimons are editors), whose posts are on topics related to theoretical computer science and to the site itself.  In one of the first posts, Suresh Venkatasubramanian, a site moderator and one of the people most responsible for its success, wrote a nice summary of cstheory's first year.

I've also volunteered to occasionally blog about the learning theory side of things.  My first entry, on learning regular languages, was just posted.  It touches on many interesting issues learning theorists regularly encounter. Enjoy!