BLOG

    The Mad Data Rush - Do Companies Know What Data Is Important to Collect and Use, and What Is Not?

    3 mins read

    In one of my previous organizations, I was chasing a very well-known retail organization for getting some projects and one of the key people in the leadership team told me that he understood how valuable data is and he would only share data with us if we pay his organization money for the data that he shared with us. This comment left me stunned for some time and it took me a long time to realize where this comment was coming from!

    Here I was, trying to tell him how by using the data on his customers he could better understand their buying behavior, help them buy more of what they would like to buy, and thereby increase his same store sales, and hence, his company would need to pay us a fee for our services, but there he was, asking me for money in return! I realized on hindsight that those were the days leading up to the frenzy around the Facebook IPO and the media was awash with articles on how having data on consumers was what made Facebook so valuable. So here was this gentleman, thinking that he was sitting on a goldmine (that was precisely the word that he used) and he would sell the rights of mining to the highest bidder.

    The reason, I highlight this story is because the world of business is now talking about data driven decision making and every CXO worth his penny, realizes the need to collect data to be able to build a data driven organization. But most people in various parts of their organizations do not know what data needs to be collected as they do not know how it is going to be used to build strategies.

    The reason for the new Data explosion and consequently the “need to collect” all possible data is because of a couple of fundamental reasons – huge amount of online/offline data that is being collected and cheaper cost of storage of data. In 1990, the cost of storing 1GB of data was about $9000, while in 2010 the cost of storing 1 GB of data was around $0.08. That is a sheer 100,000 times fall in cost. While Moore’s law states that computer speed doubles every 18 months, the “storage law” states that storage capacity doubles every 9 months. Thus the speed at which data can be processed and the capacity to store that data is increasing at a phenomenal pace.

    The prevalent wisdom in the industry is that when you do not know what data you need, collect everything that comes your way. That maybe a good strategy in the shorter term when you are setting up your data collection systems and processes, but not good at all from a long term perspective. In the longer run, organizations that collect zillions of data, struggle to get a single view of the data as the data from their different systems tell a different story. They end up spending millions of dollars, additionally, in creating complex data warehouses, which business users of data are unable to understand and hence use. The biggest tragedy of this mad data rush is that business users that could hugely benefit by mining this data often have to wait for months, before their internal IT teams / external IT service provider pulls out data that is relevant and usable for their purpose.

    I will provide a very simple example that I often use, when I explain the phenomenon of “data usability” to non-data Analytics folks. Consider a simple information field in your huge data repository called the “time stamp”. This “time stamp” stores information about any event that takes place for which data is being collected. Thus the “time stamp” field in a credit card transaction database will record the time at which a credit card was swiped, but will hardly be of any use to a person in the “collections team” of the credit card provider who follows up for late payments from defaulting card users. The “time stamp” field information that will be useful for the “collections team” of a credit card provider would be from a different system that records payments made to the credit card provider by the users. Also, the “time stamp” field per se, will not be very useful from an Analysis perspective. When we use this “time stamp” field to create another information, called “time since last payment”, that becomes useful to a “collections team” member. Based on “time since last payment” the members of the “collections” team in a credit card company categorizes card users into “good”, “bad” and “write off” and accordingly makes calls to the card users for recovering the dues, keeping the tone of their collections call as “good”, “bad” and “ugly”.

    Thus, as companies continue their “mad data rush” and invest tons of money in more and more sophisticated IT systems, they need to also invest on a second layer of information filtering to store and keep what is useful from a business perspective. Data is like a raw uncut diamond - one needs to sharpen it painstakingly and turn it into a gem.