2018-01-07

Is DataOps actually a thing?

Tom Breur
7 January 2018

Lately, my attention has been caught by a new buzzword: “DataOps.” Since I rate several of the people advocating for it, this got me wondering: “DataOps – is that actually a thing…?!?” For many years I have been saying that the maturity and adoption of Agile principles in the BI/Analytics space has lagged behind mainstream application development, by at least 10 years. Yes, we talk about it, but maturity and awareness are generally low. Discussion topics on pertinent LinkedIn groups speak volumes.

Is DataOps “just” an attempt to piggy back on the DevOps bandwagon, or is there something more to it? From the DataOps Wikipedia page: “The volume of data is forecast to grow at a rate of 32% CAGR to 180 Zettabytes by the year 2025 (Source: IDC). DataOps seeks to provide the tools, processes, and organizational structures to cope with this significant increase in data.” Big Data is real, albeit overhyped. But indisputable, it will drive innovation. And maybe that alone justified launching a DataOps Manifesto.

One of my favorite quotes amidst the DevOps hype, is that “If you have a DevOps Team, you’re not doing DevOps”, and I thought that one killed it. For me personally, DevOps is about seamlessly integrating solutions across the lifecyle of development and maintenance (I wrote about this a little while ago). This makes it clear that assigning responsibility to a particular team defeats the whole purpose you are pursuing… By the same token, I frequently get the impression that the motives for pursuing DataOps are, well, just “business as usual”: ensuring a reliable backup and recovery mechanism, staying “in control” of data quality, etc.

One of the topics that gets attention in DataOps is working under version control. Hard to argue with that, it has been common practice in application development for ages! So, yes, I fully endorse it, have seen tremendous benefits from it. But new? Anything but. Deploying Analytics solutions through a develop/test/production sequence has been around for decades, and I laud those who feel the time is right to draw attention to this “best practice.” But let’s face it: anything but new, I deem it common sense professional engineering – anything less would be “amateur hour.”

Another “big” topic for DataOps is remaining “in control” (pro-actively) of data quality issues (see e.g. this paper I wrote ). Analytics is somewhat unique, compared to application development, in that the deliverables are about data. Data is fickle in nature, and the fact you can accrue technical debt so quickly, and it can cripple progress (this paper compares Machine Learning to the High-Interest Credit Card of Technical Debt) makes it an obvious focus of attention. Ensuring high quality data often is not the default!

For similar topics around DataOps like Continuous Delivery (or Deployment), Test Driven Development, managing the entire (data) value chain, etc., nothing new under the sun. All in all, I struggle to find anything “new” or “different”, although everything around DataOps seems worth pursuing to me. I would argue that exactly because we (still) need to pursue DataOps, this reflects on the immaturity of our profession. None of the Agile/Lean/DevOps innovations were unavailable. They were hidden in plain sight all along! So if “DataOps” is now a reason (or excuse) to embrace these proven angles, and attempt to catch up with the broader “Agile/Lean/DevOps community”, I would be the last to get in the way – hooray for DataOps!

4 comments:

  1. I couldn't agree more, Tom. Nothing new, but nothing wrong with it. Good vehicle for people that are new in data management. Well written, much appreciated.

    ReplyDelete
  2. Devops in Agile takes place at a level about the development teams. It is orchestrated that way,in theory, so that the entire train can stay on schedule. So the idea of Dataops is an attempt to orchestra data governance issues in the same way. Not to have a piece meal approach to data driven by teams who are rightly more concerned about velocity then they are correctness. I have never understood how building things faster so you have time to correct them later is an approvement over knowing up front what is needed. Also you to need to recognize that data governance tools and procudures were developed long before Agile came along. As a discipline there is some catching up to do.

    ReplyDelete
    Replies
    1. Couldn't agree more we are (still) in the early stages of evolution...

      Delete