Stonebraker trapped in Stonebraker 'fate worse than death'

Oh well, I know I shouldn’t poke directly at people, but they deserve that sometimes (at least in my very personal opinion). Heck, I even gave 12h window for this not to be hot-headed opinion.

Those who followed MySQL at facebook development probably know how much we focus on actual performance on top of mixed-composition I/O devices (flashcache, etc) – not just retreating to comfortable zone of in-memory (or in-pure-flash) data.

I feel somewhat sad that I have to put this truism out here – disks are way more cost efficient, and if used properly can be used to facilitate way more long-term products, not just real time data. Think Wikipedia without history, think comments that disappear on old posts, together with old posts, think all 404s you hit on various articles you remember from the past and want to read.

Building the web that lasts is completely different task from what academia people imagine building the web is.

I already had this issue with other RDBMS pioneer (there’s something in common among top database luminaries) – he also suggested that disks are things of the past and now everything has to be in memory, because memory is cheap. And data can be whatever unordered clutter, because CPUs can sort it, because CPUs are cheap.

They probably missed Al Gore message. Throwing more and more hardware without fine tuning for actual operational efficiency requirements is wasteful and harms our planet. Yes, we do lots of in-memory efficiency work, so that we reduce our I/O, but at the same time we balance the workload so that I/O subsystem provides as efficient as possible delivery of the long tail.

What happens in real world if one gets 2x efficiency gain? Twice more data can be stored, twice more data intensive products can be launched.
What happens in academia of in-memory databases, if one gets 2x efficiency gain? A paper.
What happens when real world doesn’t read your papers anymore? You troll everyone via GigaOM.

Though sure, there’s some operational overhead in handling sharding and availability of MySQL deployments, at large scale it becomes somewhat constant cost, whereas operational efficiency gains are linear.

Update: Quite a few people pointed out that I was dissing a person who has done incredible amount of contributions, or that I’m anti-academia. I’m not, and I extremely value any work that people do wherever they are, albeit I do apply critical thinking to whatever they speak.

In my text above (I don’t want to edit and hide what I said) I don’t mean that “a paper” is useless. Me and my colleagues do read papers and try to understand the direction of computer science and how it applies to our work (there are indeed various problems yet to solve). I’d love to come up with something worth a paper (and quite a few of my colleagues did).

Still, if someone does not find that direction useful, there’s no way to portray them the way the original GigaOM article did.

This entry was posted in facebook, mysql and tagged , , , , . Bookmark the permalink.

30 Responses to Stonebraker trapped in Stonebraker 'fate worse than death'

  1. Andrew Moore says:

    Well put Domas. I wonder if the writer believes his point is valid or whether he wanted a rise from the MySQL [At Facebook] users. Whatever the purpose I think he’s going to want to do more then scratch the surface of a subject before writing such an article again.

  2. Dimitri says:

    Domas,

    you should just ignore it..

    Don’t forget that in 90′s Stonebraker sold to INFORMIX his another the best database server in the world called Illustra. Illustra was fully transactional database and presented as highly web-oriented.. – however, for a big surprise still used table-locking level for any write operations.. :-) So the most powerful “tuning” on how to improve performance of web applications with Illustra was – “never write to the database” and you’ll go very fast ;-))

    Well, nobody is perfect..

    Rgds,
    -Dimitri

  3. Mark Callaghan says:

    Oh, Illustra. My first job after grad school was working on Informix XPS in Portland, OR. That was a great job in a great city. Then Informix spent a lot of money buying Illustra and a lot of time trying to integrate it — perhaps too much time and too much money. Then there were some other problems at Informix that may or may not have been made worse by the time and money devoted to Illustra and Informix soon stopped providing awesome jobs in a great city and stopped functioning as an independent company.

  4. Mark Callaghan says:

    With respect to the topic at hand I feel bad for VoltDB. They have an amazing product for certain use cases. They also have a technical leader who has the opportunity to reach out to a broad audience. That opportunity was wasted this time.

    • Indeed. There definitely are environments that are way more demanding per megabyte of data – and whole class of in memory products can allow much faster development of services.

      I have written multiple in-memory services to handle data aggregation tasks, I’m sure today I’d look at some of existing frameworks first before writing my own chassis. Five years ago all you had for a feasible high performance solution was an idea+hash library. :-)

    • Mike says:

      Mark, sounds like you have MySQL Stockholm syndrome…Just Kidding ;-)

  5. Kevin says:

    Wait, I thought Stonebraker was writing satire. Was he serious? That went from amusing to sad…

    • Ryan says:

      The byline on that article is not Stonebraker’s. I don’t know what was said in the interview, but quoting Stonebraker as the writer is inaccurate.

  6. John says:

    It’s seriously not-done to make fun of someone who has made significant contributions to the DB field, contributions that power, amongst others, Facebook and your own career.

    Stick to the facts, instead of pulling in Al Gore and the environment to power your arguments. Perhaps you forgot, while composing your rant, that (solid state) memory cost has decreased quite a bit. Not as cost efficient as ‘old’ disc, but it’s getting there.

    • John, thanks for your insightful comment, I appreciate it.

      I also appreciate hard work, both of Stonebraker’s, as well as his contemporaries. I do appreciate though work of my contemporaries too.

      This is my personal space, though the article we talk about ended up being #1 headline in information technology (according to LinkedIn!), and it had no relevant facts, only opinion.

      I did not forget about costs, especially I did not forget about costs at scale. And by the way, solid state is not RAM, and the architectures like VoltDB wouldn’t work that well with solid state as a backing storage (page faults would stop all processing).

      If you want facts, feel free to follow materials distributed via http://facebook.com/mysqlatfacebook – there are plenty of those over there.

      Cheers!

      P.S. Regarding “green” argument, your sarcasm detector is a bit off ;-)

  7. John says:

    Hello Domas,

    Thanks for the link to MySQL@Facebook, it’s appreciated since the pages features … facts.

    I do agree DB academia is a bit of a Mount Olympos / ivory tower affair; they could do with somewhat better PR.

    And yes, my sarcasm detector is off since I find sarcasm scrambles communication and find it unnecessary most of the time.

    • John, you’re welcome. Occasionally you can find facts on this blog too, though there’s a big chance I will link to them from m@fb page then ;-)

  8. I think the person who’s most deserving of criticism is the article’s “author,” Derrick Harris. You can hardly expect Stonebraker to give an interview that isn’t self-serving. It’s his job to promote the business and the technology, which has its merits. But you would think Harris would want to be more than just a mouthpiece.

    • I agree, author emphasized speculations (he picked the headline, for sure) for the sake of grabbing headlines, and it was in Stonebraker’s interest.
      They succeeded, it is probably most popular article on their site ever, yet the cost for that was exposing lack of integrity or analysis.

  9. Josh Berkus says:

    Domas,

    FWIW, the article is probably an accurate reflection of Stonebraker’s attitudes.

    Keep in mind that Stonebraker is an academic and a database engine engineer. He only works on low-level stuff; if he’s ever put up a website on his own in his life, it would be news to me. When you’re a platform engineer, everything on the application side looks easy (and vice-versa, for that matter).

    VoltDB is struggling because it’s a really good database for a use-case which may not exist. So I think Stonebraker is trying to drum up some publicity via creating controversy. At the stage VoltDB is at, *any* press is good press. So I don’t agree that Harris did Stonebraker a disservice. Whether he did his readers one is another question.

    • MattK says:

      > a use-case which may not exist

      Curious, which case is that?

      • in-memory distributed OLTP for very very high throughputs on that data.

        it usually means you have more data than what fits on one box, and need very very high performance to all of it.

        the closest product to VoltDB is currently MySQL’s NDB (Cluster), but even that one supports on-disk data nowadays.

    • Dan Weinreb says:

      Actually, is VoltDB struggling? I had the impression that, as a company, they are doing somewhere between pretty well and very well.

      VoltDB is a very interesting point in the space of possible DBMS’s. When the company was first starting, we were one of the first companies to evaluate whether it was a good fit for what we are doing. The killer problem is that our code is written in such a way that inside transactions, we interleave database queries with code that has to run within our process, rather than in a Java stored procedure, because it affects our own data structures.

      Now, had we written the whole thing with VoltDB in mind, i.e. always do “begin transaction, run code on the DB server including interleaved Java and SQL, commit and return results”, maybe VoltDB would have been fine. I’m not sure; there would have been many other points to examine and evaluate. For example, we do sometimes do ad hoc queries, and at least at that time, ad hoc queries were not well-supported.

  10. Rob A says:

    The interesting comment here to me was: “Building the web that lasts…”

    A year after my daughter was born, my wife and I were able to bring up the year-old FB posts, with all 100+ comments, to print and add to her baby book. Thanks online physical media!

  11. Pingback: NoSQL, NewSQL, MySQL: not a zero sum game — Too much information

  12. Jon Watte says:

    I wonder if you can’t get 80% of the “in-core” benefit for 20% (or less) of the cost by simply running MySQL on a RAM disk on server nodes with lots of RAM. Or even just tune the kernel/server to lazy copy-back file buffer caching, and make sure the working set never exceeds physical RAM. Especially if you already do application-level sharding (which really isn’t that hard if you think about it up front).

  13. Simon H says:

    I’m pretty sure we just got played by Stonebraker.

    Let me put it this way – before his rant – I didn’t know who he was and I had never heard of VoltDB. Now I do and I’m sure many hundreds of IT managers now do as well.

    By being a calculating turbo-douche, Stonebraker may have just got himself a 20% jump in his customer base in the coming year. Well played, sir!

  14. Dan Weinreb says:

    Here’s what I really want to know. Stonebraker says that Facebook is in big trouble and needs to do something radical. When it comes to data infrastructure, is Facebook really in trouble? That is, is there a problem needing to be solved AT ALL?

  15. But Facebook’s data is all in-memory anyway! TechCrunch says so right here: http://techcrunch.com/2011/07/12/y-combinator-alum-memsql-raises-2-1-million-from-ashton-kutcher-sv-angel-and-more/ And InfoWorld says it will be easy to rewrite Facebook, right here: http://www.infoworld.com/d/application-development/facebook-afraid-code-rewrites-905

    After reading those articles from the authorities on Facebook, I feel entitled to have an opinion about Facebook, too.

  16. Lachlan Mulcahy says:

    I read that article and my first thought was “what a douche”… in hindsight, I’m not sure if I should be talking about the writer, Stonebraker, or both.

    I wish some of my projects would suffer a similar fate to Facebook’s. Being valued at numbers approaching those that make a BIGINT field quake with fear, but having to live with the pain of being _trapped_ in MySQL. The horror!

  17. Pingback: Is Stonebraker right? Why SQL isn’t the choice du jour for many apps — Cloud Computing News

  18. Pingback: NoSQL is a Premature Optimization « SmoothSpan Blog

Comments are closed.