Tom
Breur
7 January 2018
Lately, my attention has been
caught by a new buzzword: “DataOps.”
Since I rate several of the people advocating for it, this got me wondering:
“DataOps – is that actually a thing…?!?”
For many years I have been saying that the maturity and adoption of Agile
principles in the BI/Analytics space has lagged behind mainstream application
development, by at least 10 years. Yes, we talk about it, but maturity and
awareness are generally low. Discussion topics on pertinent LinkedIn groups
speak volumes.
Is DataOps “just” an attempt
to piggy back on the DevOps
bandwagon, or is there something more to it? From the DataOps
Wikipedia page: “The volume of data is forecast to grow at a rate of 32% CAGR
to 180 Zettabytes by the year 2025 (Source: IDC). DataOps seeks to provide the tools, processes,
and organizational structures to cope with this significant increase in data.” Big Data is real, albeit overhyped. But
indisputable, it will drive
innovation. And maybe that alone justified launching a DataOps
Manifesto.
One of my favorite quotes
amidst the DevOps hype, is that “If you have a DevOps Team, you’re not doing
DevOps”, and I thought that one killed it. For me personally, DevOps is about
seamlessly integrating solutions across the lifecyle of development and
maintenance (I wrote about this a little while ago). This makes it clear that assigning responsibility
to a particular team defeats the whole purpose you are pursuing… By the same
token, I frequently get the impression that the motives for pursuing DataOps
are, well, just “business as usual”: ensuring a reliable backup and recovery
mechanism, staying “in control” of data quality, etc.
One of the topics that gets
attention in DataOps is working under version control. Hard to argue with that,
it has been common practice in application development for ages! So, yes, I
fully endorse it, have seen tremendous benefits from it. But new? Anything but.
Deploying Analytics solutions through a develop/test/production sequence has
been around for decades, and I laud those who feel the time is right to draw
attention to this “best practice.” But let’s face it: anything but new, I deem
it common sense professional engineering – anything less would be “amateur hour.”
Another “big” topic for
DataOps is remaining “in control” (pro-actively) of data quality issues (see
e.g. this paper I
wrote ). Analytics is somewhat unique, compared to application development, in
that the deliverables are about data.
Data is fickle in nature, and the fact you can accrue technical debt so
quickly, and it can cripple progress (this paper compares Machine Learning to the High-Interest Credit Card of
Technical Debt) makes it an obvious
focus of attention. Ensuring high quality data often is not the default!
For similar topics around DataOps
like Continuous Delivery (or Deployment), Test Driven Development, managing the entire (data) value chain, etc., nothing
new under the sun. All in all, I struggle to find anything “new” or
“different”, although everything
around DataOps seems worth pursuing to me. I would argue that exactly because
we (still) need to pursue DataOps, this reflects on the immaturity of our
profession. None of the Agile/Lean/DevOps innovations were unavailable. They
were hidden in plain sight all along! So if “DataOps” is now a reason (or
excuse) to embrace these proven angles, and attempt to catch up with the
broader “Agile/Lean/DevOps community”, I would be the last to get in the way –
hooray for DataOps!
I couldn't agree more, Tom. Nothing new, but nothing wrong with it. Good vehicle for people that are new in data management. Well written, much appreciated.
ReplyDeleteThank you!
DeleteDevops in Agile takes place at a level about the development teams. It is orchestrated that way,in theory, so that the entire train can stay on schedule. So the idea of Dataops is an attempt to orchestra data governance issues in the same way. Not to have a piece meal approach to data driven by teams who are rightly more concerned about velocity then they are correctness. I have never understood how building things faster so you have time to correct them later is an approvement over knowing up front what is needed. Also you to need to recognize that data governance tools and procudures were developed long before Agile came along. As a discipline there is some catching up to do.
ReplyDeleteCouldn't agree more we are (still) in the early stages of evolution...
Delete