February 10, 2013

What does Data Science, Borat and Monty Python have in common?

Research: the heart that pumps data

If you talk to any kid in school about how he acquires knowledge, he would say that is through studying. The study uses books, classes, internet resources, and all the classic means of knowledge gathering. The further school gets into science the more important it gets to gather knowledge through exploration methods like experiments. Hypothesis formulation and the underlining tests it takes to confirm, deny or neither, is something very few schools or learning methods teach from early stages. Actually it's what kids love the most: experiment.

In today's businesses there is very little room to experiment, and even if they want there is simply not enough resources to do so. This is the realm of research. The need to find answers to hypothesis like:

- How can I produce 20% more finished product decreasing up to 10% my running costs?

- What is the financial and social impact of opening my coffee shops 1 hour earlier in suburban areas?

This looks like rocket science for some, but actually it's Data Science. The same that is used in genomics, where you sequence DNA chains to find out correlations between certain diseases and genetic changes, or RNA patterns. Today every business wouldn't say no to deeper research, but to initiate a Data Science practice inside a business you might need to invest some time and money. Research is like an engine that sucks data from everywhere and pumps it again to the system. How can companies get from simple statistics gathering and presentation to a data science practice?

From Stats to Science

From the CEO perspective of a regular retail chain this might sound a big stretch. Let's imagine an ACME Bagels 'r Us chain. First you need to gather the data from commercial transactions and probably come up with basic dashboard stats for you finance, marketing and supply chain management people. Once you do it, everyone will start to operate with a better vision, but still not connected. Ok, this is when IT guys go and convince the CEO to setup a Data Warehouse. The project will have massive financial constraints, so some best practices won't be followed. The most likely result will be that a big system will created that looks like an elephant: each team as a different view:

- CFO sees it as a money sucking machine

- IT sees as a major resource consuming area, and a major headache to maintain

- Marketing sees it as cumbersome and slow

- CEO sees it as a big canon to kill a fly

All this fuss for stats? CIO thinks it's time to create a Business Intelligence Center of Excellence and hires a couple of analysts and buys some nice graphical tool set.

Questions might be answered more quickly now, information flows easily between departments and some inefficiencies can be spotted easily. This is all very well and peachy until the business starts wondering about their business model, innovation becomes a matter of survival or Bagels 'r Us starts thinking of going into new markets.

We arrived to a point when all sorts of new questions won't be able to be answered with stats and basic correlations. Probably the company needs to put a thinking cap on to get fast and straight to the point. Bring research principles to business and you will unlock the need for Data Science.

Data Science: Service or Internal Practice

I see great potential for Data Science as a Service (DSaS), although this raises an amount of data security issues that keeps most businesses from even starting to considering it. There are several pros and cons to the DSaS option:

- Data Science (DS) needs to correlate data from several departments and each LoB owner doesn't want to share their data with the neighbor department. Let's call it the syndrome of the People's Front of Judea (PFJ), that as you know hated the Romans but nurtured a bigger hatred for the other faction. Remember PFJ's Motto: "Death to the Judean People's Front!". If you don't know what I'm talking about see this video. This is a major con for either service or internal practice.


- The option to externalize it will offloads the problem solving process to organizations that can do it much better due to acummulated knowledge in DS practice. This might be seen as an accelerator to innovate in companies that typically take some time doing it. This is the positive outlook of the DSaS model but the lack of offering is what will keep companies from doing it at an enterprise scale. Probably the PFJ will do it as a service just to show those splitters from JPF how easy is to streamline their processes.

- Specific industries are used to do their own data mining and taking it to the next level would probably expose some of their hidden practices from a competition perspective. Oil & Gas being one of those where DS is of utmost importance but would never be exposed outside the remit of the corporation.

- SMB companies that are targeting mass market at a global scale will need the principles and advantages of DSaS to address those challenges, because they are just too small now to create the internal practice.

Data Science as a Service: a new business model, advise from Uncle Borat

So if you are wondering who will provide this service, there is so much stitching between IT, Science and Business that this could easily constitute a new business model all together. New companies providing Data Science as a service would be a sum of hacking mentality, business process savvy mindset and enough math knowledge to find patterns in haystacks:

Danger Zone

Several common grounds here, but clearly the fastest one is the danger zone as Big Data "uncle" Borat states in reply to Andrew Lock concern in one of the Strata Conferences:

"How you not fall in danger zone??"

to which Big Data "uncle" Borat replies:

Big Data uncle Borat

"I no scare of danger zone I take highway to danger zone" 

In fact there are more danger zones, but let's just state the obvious for now: which companies will offer DSaS? for retail? for O&G? for Comms? FS?

Time for a different breed of entrepreneurs. Where's the line?

No comments:

Post a Comment