I have always been fascinated by data and how it could be used to run a business, create investment opportunities and understand and affect behavior. As I was getting my engineering degree in the ’60’s, I was exposed to the value of historical data and the use of algorithms to reach conclusions under uncertainty. My first job out of the Colorado School of Mines was, strangely, with Procter & Gamble. P&G was a big user of data in its product development, marketing and manufacturing operations. After business school, I was fortunate enough to go to work for a research boutique, Mitchell Hutchins, that was prepared to take full advantage of the early big data manipulation capabilities coming from dumb terminals connected through time sharing to large computers. Mitchell Hutchins was involved in the creation of Data Resources, an economic forecasting company co-founded by the late Otto Eckstein. Data Resources created very sophisticated forecasting models that could either be accepted or manipulated by its clients over these early networks. We also used the network to create individual company models and valuation models which became a regular part of reports to clients. The reports themselves were printed and distributed through the US mail or some early private delivery services. I must say the models that were developed through all this data manipulation were seductive and created an air of certainty in the conclusions that we reached. The higher the correlation coefficients and the R-squared, the more we believed. I am not sure if our forecasts, or those of Data Resources were that much better than others created through less sophisticated means. To Otto’s credit, while publishing his models, he remained a skeptic of the end results. “Add factors” were always a part of his forecasts in discussions with clients. I think we all ultimately learned to respect the power of data, but, at the same time, recognized that the models were only as good as the inputs we were using, and what we didn’t know or measure was as important or more important than what we knew. I think that applies even more so today as the available data expand. We will get some answers that we didn’t have before, but let’s all remain skeptics and avoid the seduction of the certainty associated with the size of input, the sophistication of the models, and the speed with which we get the answers.
So let’s explore Big Data.
According to E-Bay the volume of business data is doubling every 1.2 years. The amount of data Big Science is accumulating dwarfs the business community. Several “laws” have come into play producing these enormous amounts of data and getting everyone excited about what can be done with Big Data if analyzed and manipulated properly.
The Harvard Business Review devoted much of its October, 2012, issue to Big Data. McKinsey and others have published numerous reports on the topic. This is all with the belief that applying the proper analytics to all this data can lead to better business decisions replacing or reinforcing intuition with hard facts coming from more complete and precise information. It all starts with the latest version of Moore’s Law: Processing speeds double every 18 months. This is without question the most important law related to Big Data. In my view, particularly these days, the utility of data is inversely proportional the amount of time it takes to process the data. Wirth’s law comes into play here: Software is getting slower more rapidly than hardware becomes faster. Other more assertive variations have been put forth: May’s law (sometimes facetiously called Gates’ Law): Software efficiency halves every 18 months offsetting Moore’s Law. The impact and perceived importance of processing the Big Data coming at us will very likely put even more of a premium on efficient software. For most applications, developers have had it easy. Processing speeds have allowed for the development of lazy code. One would hope that the exigencies of data growth change that. Otherwise value creation will lag and negate some of the other laws at work here. Metcalfe’s Law: The value of a network is proportional to the square of the number of connected users to the network (~n squared)–or probably more appropriate in today’s social media world, Reed’s Law: The utility of a large network can scale exponentially with the size of the network (~2 to the nth power). The real value or utility becomes, in most instances–certainly in the social media world–the near instantaneous analysis producing an economic action.
It has become a given that the proper use of data, i.e., metrics, can actually allow one to make better judgements and business decisions. Proper and selective use of the data becomes the key. Within a business what one measures can also affect how those generating the data behave. It should be apparent that it becomes important what one measures and how whatever data are collected are used. There is a variation of another law at work here, Parkinson’s Law. In its original: Work expands to fill the time available or in its computer corollary: Data expands to fill the space available for storage. With networks expanding, processing speeds increasing and the cloud and more powerful servers ultimately providing infinite storage, consultants and business school professors have discovered Big Data. In my view, this is creating another variant of Parkinson’s Law: The number of conclusions one can reach expands proportionately with the quantity of data available and inversely with the time it takes to analyze the data. All of those conclusions may be actionable. That doesn’t mean they will have a positive effect. It also doesn’t mean we shouldn’t seek these answers. It is not even a question of “should.” These answers will be sought.
There will be an advantage to those who are the early users of Big Data. This certainly proved to be the case in the investment and trading community. Every day an enormous amount of data are generated on stock price movements, trading volume, business results, economic results and, of course, the opinions of the pundits in the media and in the research departments of a wide variety of financial services entities. Models have been built and continue to be built and modified that attempt to show correlations among securities and deviations from those correlations. The low cost of trading combined with the speed at which a transaction can occur has allowed traders to take advantage of minute variations in highly correlated securities. Those who have created the better models and/or can react more quickly to a variation have done quite well. The importance of speed of reaction has been such that some traders moved their processing closer to the source of the information and the trading shortening the time it takes for electrons to activate and produce a transaction. It is a business where minute fractions of a second can make the difference. The models, though, have to keep morphing in terms of inputs and speed to stay ahead of the competition. Otherwise they all converge eliminating the disparities that produce profits. The Fallacy of Composition comes into play: When everyone stands up to see, no one can see. I think this is already happening in the trading community.
In the long run this will likely happen in other communities as well. We see aspects of this in consumer product marketing. This community has always been good at analyzing the data available to it to discern what customers want or can be made to want. This has led to a wide array of similar products from various companies with little distinction among them. The first movers always had an advantage for a brief period, but ultimately, others developed competitive products. Youngme Moon describes these phenomena well in her wonderful book “Different: Escaping the Competitive Herd.” What she describes has broad application beyond the marketing community she uses as her examples.
There is a Big Deal about Big Data. The advances that can be made in science, business and, in particular, in the social media world are very exciting–a little scary, but most exciting things are. The early users will have an advantage–maybe a sustainable one as they learn what they still don’t know and adjust accordingly. It is important to understand that the outcomes will only be as good as the inputs and the analytics applied to them. To the extent one comes to rely on these outcomes without understanding what remains unknown, it increases the risks of larger and larger unintended consequences through error or just faulty or incomplete models. The models will always be incomplete. The more we accept that premise the more value our use of Big Data will have. It is hard to imagine the outcomes every time processing speeds and data accumulation double. We are on our way to a more superlative adjective replacing Big. Hang on!