About the Blog

The FICO Labs blog focuses on innovation, technology and the role of analytic sciences in today’s business world. We’ll share our perspectives, vision, successes and challenges in the areas of predictive analytics, Big Data, decision optimization, cloud computing and Big Marketing.

06/19/2013

Infographic: The Analytics Big Bang

The need to decode data and build customer intimacy is driving a huge expansion in predictive analytics:

  • In twelve years the analytic software market went from $11 billion to $35 billion
  • In nine years the R programming language for analytic software grew to 1 million users
  • In eight years IBM spent $16 billion buying analytics companies
  • In one year the number of data scientist job posts has jumped 15,000%

In the last 70 years, analytics have gone from the province of governments to tools that small businesses can use. This has been driven by numerous innovations in analytic technology. FICO today released an infographic that looks at the evolution of analytics and the bursts of innovation that are propelling us to the future of ubiquitous analytics.

The-Analytics-Big-Bang-Infographic-FICO-480px

Note: click on the infographic above to enlarge it, or view a larger version here.

06/17/2013

It’s the End of Data Warehousing as We Know It … And I Feel Fine

Checkered_flag

By Josh Hemann

Every couple of years, someone comes along and declares the end of something. In 1992 Francis Fukuyama declared the end of history, in 1999 Salesforce.com declared the end of software, and just last week, commercial Hadoop provider Cloudera declared that Hadoop would end the data warehousing era. (Although to be fair to CEO Mike Olsen, the headline on TechWeb declared it; he wasn’t actually quoted as saying it.)

Does anything truly end? No, it just transforms. We are now in a transformative era for data, data warehousing, analytics and enterprise software in general. Big Data is becoming part of enterprises’ strategic information architecture that deals with data volume, variety, velocity and complexity. And it is forcing changes to many traditional approaches. According to Gartner Group in its 2013 predictions, “this realization is leading organizations to abandon the concept of a single enterprise data warehouse containing all information needed for decisions. Instead they are moving towards multiple systems, including content management, data warehouses, data marts and specialized file systems tied together with data services and metadata, which will become the "logical" enterprise data warehouse.”

Of course, if you have been following this part of technology before Big Data became a catch-all phrase, you’d know this is not much of a prediction. For example, database pioneer Mike Stonebraker wrote about the end of relational databases in 2009. What he actually wrote about was that the “one size fits all” approach from relational database providers was no longer going to cut it, and new solutions would be needed to better address areas like unstructured text data or fast querying. And Stonebraker didn’t just rant on the topic: he founded three “next gen” database companies –  Vertica, VoltDB and SciDB – each aimed at tackling one of the nine major application areas he thought would require specialized data persistence.

OK, so back to Hadoop and the end of something… Low-value data that comes in large volumes very quickly seems like a great candidate for Hadoop-based systems because the cost-per-stored-terabyte is so cheap. Examples would be clickstream data and web server logs. But as the data becomes more valuable, storing them in Enterprise Data Warehouses (EDWs) that have evolved for the past 30+ years makes a lot of sense, because the entire ecosystem is much more well-understood and mature. Furthermore, sometimes the cheap commodity hardware story that makes Hadoop so cost-effective is not sufficient. When Oracle acquired Sun in 2010, it was precisely to deliver better integration of hardware and database software to optimize performance. And Intel recently jumped into the Hadoop fray with their own distribution that is optimized for use on their Xeon processors, with Cray just announcing last week a new supercomputing cluster built to run this Intel Hadoop distribution. Hardware actually does matter.

So what is “high value” data? From my point of view in working with businesses on being more customer-centric, high-value data come from  “transactions” mapping to important human actions (e.g., buying something at a store, selling a stock) that you have to persist correctly, immediately. Such data invariably feed many, many other systems and have numerous scheduled as well as (more importantly?) ad hoc queries run against them.

A recent Infochimps presentation provides a nice way to think about this value distinction with the following slide:


Infochimps

What jumped out at me seeing this slide is two things:

  1. The Enterprise Data Sources is confusingly mixing sources of data with data persistence. One can buy data from Axciom, so Axciom is the source, but a Teradata EDW is not the true source of the data. It is just the persistence layer in a much larger IT infrastructure that gets feeds from many sources (Point of Sale systems, inventory management systems, real estate teams, etc.)
  2. Mike Olsen wants you to believe that you can increasingly push out the Oracles of the world and replace that persistence capability with Hadoop. But you have to understand the larger Enterprise Data Sources infrastructure to know where all that data is coming from before you can hot swap in a Hadoop cluster and suddenly save money or get insights

My point of view, at this point in time given where things are at technologically, is that the Non-Traditional Data Sources domain is perfect for Hadoop. These are generally high-volume, fast-changing, but low value data that in aggregate can be highly useful, but I really don’t want to pack tons of Facebook “likes” from dead people and barely understandable comments from YouTube videos into my multi-million dollar EDW. All of this lower-value, quickly changing data can go on cheap commodity hardware that uses Hadoop-based software to manage and query. Conversely, I don’t want to put my high-value customer transaction data, fed from a multi-million dollar Point of Sale system, onto cheap commodity hardware running Hadoop software that is rapidly changing, lacks functionality, and is harder to hire talent for compared to EDW solutions.

It is not the end of the data warehousing era, just the blurry end as we know it.

06/10/2013

Models Behaving Badly: The Case of the Million Dollar Amazon Book

Police
Between the summer of 2011 and the summer of 2012 there was a 15,000 percent increase in analytic scientist job postings. Today, the median salary for an analytic scientist is in the six figures. By definition we're in the midst of a talent crunch.  And we’re starting to see what happens at the edges when there aren’t enough quality analytic scientists or companies don't have the budget to hire one -- models behave badly.

Someone reminded me recently of an oldie but a goody from back in 2011 – the early days of the talent crunch. Michael Eisen, a biologist at UC Berkeley, touched upon this subject in his it is NOT junk blog, when one of his graduate students found two new copies of Peter Lawrence’s The Making of a Fly for $1,730,045.91 (+$3.99 shipping) on Amazon. Over the next several days, they watched the price climb to $23,698,655.93 (plus $3.99 shipping) before anyone noticed and the book returned to a reasonable $106.23.

Does this still happen today? Yes, in two extremes.

  1. Merchants undercut each other so much that they lose money on the transaction.
  2. On rare or limited items, prices get out of whack resulting in $7,000 price tags for something that is worth a few $100 at best.

So, what is happening? If you comb the Amazon discussion boards, the predominant theory is that merchants are greedy. But we know that doesn’t explain it.

  • The Arbitrage Business Model enables a merchant to buy a product in one market and sell it in another. In the case of Amazon, often from another listed vendor.
  • Ready access to APIs for adding and removing inventory from a site like Amazon, plus the transparent pricing, means that anyone with minimal skill can trade on any price discrepancies between marketplaces.
  • Simplistic/Naive models. Poor pricing behavior happens when relatively simple assumptions without a lot of supporting analysis are made. Often the assumptions don't take a domain-specific objective into consideration. Injecting domain-specific knowledge into modeling is key to avoiding misleading predictions. Handling edge and corner cases  is one of the most difficult parts of developing analytical solutions to problems, and lots of such cases can be induced when using only simple rules rather than probabilistic models to adjust prices.
  • Immature markets. Without the capacity to “short” something, not to mention the inability to issue forward, future, or options contracts, speculators have little incentive to provide any immediate pricing discipline on Amazon. There’s no reason prices shouldn’t drift all over the place, irrespective of cause. It’s no big deal until somebody manages to make—or lose—some money off of it.
  • Automatic execution of models requires frequent tracking and robust sanity checks. The model in the Amazon example wreaked havoc for about 51 days before one of the sellers noticed the issue and fixed it. Good analytic solutions come with better tracking than this!

This is only the tip of the iceberg. A talent crunch with healthy salaries is producing a rush of people calling themselves analytic scientists, without the proper experience or training.  Have you seen any models behaving badly recently? We'd like to hear about them.

06/05/2013

Two Roads Intertwined: Big Data and Customer Centricity

Two Roads
By Andrew Jennings

Two trends challenging business thinking today – Big Data and customer centricity – seem at first to be antithetical. Driving decisions from more and more data raises the specter of dehumanizing business interactions. But the real value of Big Data for business is the opportunity to learn about our customers at such depth and speed that we can truly put them at center stage.

Still, answers to the most important questions, aren’t just there to be scooped up from Big Data, such as:

  • How is this customer likely to respond to this action?
  • What new needs can we anticipate?
  • What does this changing behavior mean?

Next Generation Analytic Learning

Companies that become very good at next generation analytic learning -- by this I mean data-driven learning enabled by computing infrastructures and analytic techniques that make it practical to examine data of very high volumes – will be able to orient their entire operation around their customers. They’ll engage customers and build win-win relationships with such insight, innovation and efficacy that they’ll be very difficult to dislodge as providers of choice.

Big Data can help or hinder us on the way to customer centricity. Today we have the means to capture and analyze much bigger quantities of data than ever before, and to make meaningful connections between different types of it. We can analyze data in-stream for real-time decisions. We can distribute analytic tasks in a massively parallel manner across many processor nodes, then algorithmically assemble their outputs into a single result. But is any of that helpful for achieving customer centricity?

It’s most helpful when we can systematically extract the most valuable analytic insights – causal relationships – from Big Data. These insights enable us to understand individual customer behavior and sensitivities, anticipate needs, and predict likely responses to offers and treatments. In some situations we must find and act on such insights as data is streaming in. In others we can use out-of-stream methods to dive deeply for them.

Big Data computing infrastructures are making it practical to employ automated machine learning algorithms for this purpose – but human expert oversight is essential to ensure results make business sense and are useful in operations. And ultimately, whether any of these insights have an impact at all on customer centricity depends on how quickly we can pump them into operations so that they drive and inform every decision we make and every interaction we have with our customers.

These are essential capabilities for turning Big Data into an enabler for customer centricity. They’re the fundamentals of what I call “next-generation analytic learning.” Next-generation analytic learning starts with what you want to know about your customers – in other words, with business questions like “Which of my customers are most sensitive to discount coupons?”

Starting with the business questions helps you target the right data. I call this approach “next generation” because it elevates test-and-learn methods to a new level of efficacy. These improvements to the traditional champion-challenger method were not triggered by Big Data; they’ve evolved in response to increasingly granular customer treatments and rapidly changing customer behavior. Still, next generation learning is certainly suited to the challenges and opportunities Big Data presents. It’s a systematic, highly efficient way of continuously advancing what we know about our customers and improving how we use those insights to interact with them.

We will continue this conversation next week on the Banking Analytics Blog. I also explore into next-generation learning in my recent Insights white paper: "When Is Big Data the Way to Customer Centricity?" (registration required).

05/30/2013

Myth Busters: The Analytic Talent Crunch Will Constrain Big Data Innovation

Help Wanted
By Benjamin Baer

Earlier this month, we posted our first in a series of myth busters, inspired by the Discovery Channel’s television show MythBusters. Over the next several months we’ll tackle hot topics related to Big Data, analytics, customer engagement and mobile technology and we’ll determine whether the topic can be confirmed, is plausible or is busted (not true). 

Thomas Davenport and D.J. Patil brought the data scientist into the national spotlight in last October’s Harvard Business Review article titled Data Scientist: The Sexiest Job of the 21st Century, and Indeed.com reported that job postings for analytic scientists have jumped 15,000 percent between the summer of 2011 and 2012. Many believe the “shortage” will only get worse. McKinsey & Company predicted a 50 to 60 percent shortfall in analytic scientists in the US by 2018. And Gartner echoed this sentiment predicting that only one-third of 4.4 million global big data jobs will be filled by 2015.

By definition we have a talent crunch on our hands. But will it constrain Big Data innovation? We say no…busted. Here is why:

  1. Analytic technology will get simpler to use.  Over the next few years we will see reduced complexity in collecting, processing, analyzing and acting on Big Data. Big Data Analytics is not immune to Moore’s Law.  Over the last several decades processors became faster, computing power became cheaper, chips became smaller and applications became easier to use. Simple always wins, even with Big Data Analytics. There will be an explosion in tools geared for the business user, there already are a few available today…
  2. Machine learning and analytic solutions will do much of the heavy lifting.  More and more applications like customer engagement, debt management, and customer management will include fully integrated analytic capabilities. This along with improvements in machine learning will enable businesses to handle the analytic “drudgery at scale” and free their analytic scientist to focus on the high value projects.
  3. Cloud technology will deliver Big Data Analytics into more hands. Just as Cloud revolutionized ERP and CRM, it will change the dynamics of Big Data Analytics. It will give more people access to the powerful (and simpler) tools and solutions, and massive amounts of data – it will make Big Data palatable.
  4. People will become more comfortable with analytic tools, therefore freeing the analytic scientist to focus on the most complex projects. As tools become simpler, and more accessible they will become second nature to business professionals and IT.
  5. Data will no longer need to be pristine. Much of the time and expense in the analytics process today involves confirming that the data sample is accurate and useful. As FICO World keynote and Economist Data Editor Kenneth Cukier put it, Big Data by its nature is messy, and does not require the same devotion to precision and accuracy as when dealing with smaller data sets. When using large sample sizes of billions and billions of records (structured and unstructured), often messy is good enough.

While the analytic talent crunch is very real, it will not have as deafening an effect on Big Data innovation as some pundits would you have you believe. In the not too distant future, the analytic scientist will be able to focus on the most complex projects, and the business user will be able to competently handle the rest. Did we convince you?

05/24/2013

Video: Stuart Wells Talks Customer Intimacy and the Third Platform

Check out the video below of FICO CTO Stuart Wells’ keynote from FICO World. In the keynote he discusses an initiative to help businesses improve customer intimacy and solve complex problems using cloud-based analytics. He presents the vision for an analytic cloud, which will make FICO’s applications available as cloud-based services, enable businesses and application developers to build their own solutions, and provide a marketplace for analytic applications.

05/22/2013

A Moment of Science: LLVM May Mean the Best of Both Worlds is Possible for Analytic Computing

LLVM-Logo-Derivative-4

By Josh Hemann

I got my start in programming with C++ (not counting hacking DOS video games like Duke Nukem in a hex editor). One summer in college I had an undergraduate grant with a professor who had me write a neural network algorithm using back propagation. The next summer this same professor had me recode it all in Java, as it was the hot new language. So from the start, my programming needs have been in the context of analytics, often exploratory in nature, which is why I quickly gravitated from compiled languages like C++ and Java (statically compiled) towards dynamic languages like MATLAB, R and Python (runtime compiled).

The trade-off in using dynamic languages over compiled ones is about speed. Dynamic languages better enable the exploratory programming needed when tackling new analytical problems, which leads to faster development of solutions. The trade-off is that this flexibility means that compilers do not have enough information about the code and data to optimize for run-time execution speed. In many settings, increasing computer time for saved human time is a good trade-off, but of course sometimes execution speed is critical, even in exploratory work (i.e., it is harder to iteratively refine a modeling approach when code takes many hours to run). Wouldn't it be great if there was a way to have the flexibility and expressiveness afforded by dynamic languages but with more of the execution performance of compiled languages?

This wouldn't it be great if wish has been around for a while, but three events centering on the LLVM compiler over the past year make this wish a lot closer to reality:

  1. NVIDIA moving from an in-house compiler for their GPUs to LLVM
  2. The emergence of and excitement around the Julia language for technical computing
  3. The emergence of and excitement around the numba project for compiling Python code to the LLVM

While this seems like an unrelated mix of events, the common thread is that important tools in the modern technical computing stack are moving towards using LLVM. So, I figured it was about time I became more familiar with the topic and this counterpart to this post shows some of my recent foray (I also link to all of the code and impressive speed results at the end).

Why LLVM matters

Writing this post certainly pushed me past my comfort zone. I have spent most of my focus over the years on applying analytics to business problems and not on lower-level computing issues like compilers and memory management. But, as in many areas of computing these days, more data, more complex questions, and expectations of real-time results means that issues of the past have come back to the fore. For example, mobile developers have to deal with limited screen space and be hyper-focused on keeping memory consumption down, just as people doing any kind of computing had to in the 1980s. It basically means we have to head the advice of Peter Norvig to remember that there is a "computer" in "computer science". To do even exploratory analytical work nowadays means having to develop and maintain some level of maturity around technical computing. And then of course, there is the rest of being a data scientist, like keeping up with the evolutions in analytic methods, software implementations of said methods, domain knowledge for usefully applying said methods, techniques for visualizing results, and best practices for conveying these methods and their results to technical and non-technical audiences. Sigh...

As overwhelming as this collection of issues feels sometimes, it motivates my excitement about projects like LLVM and numba: They provide a single, consistent abstraction layer on which I can maintain reasonably high performance code in languages that are very flexible and efficient for me to develop in. I can target execution against a single core on a CPU, multiple cores, or even completely different hardware like NVIDIA's GPUs that have thousands of cores, all through LLVM and numba. If I were to continue developing this matrix factorization algorithm for a recommendation system I would now have a way of iteratively testing approaches on much larger, more realistic data sets rather than waiting nearly 3 minutes every time I tested against a toy data set. This means I would be more likely to actually explore and evaluate solutions, and that is what is most exciting.

Acknowledgements and More Reading

05/14/2013

Myth Busters: For Digital Marketers Is Email Dead or Should It Be?

Email
By Feather Hickox

Inspired by the Discovery Channel’s television show MythBusters, we decided to do a little Big Data Analytics myth busting of our own. Over the next several months we’ll tackle hot topics related to Big Data, analytics, customer engagement and mobile technology and we’ll determine whether the topic can be confirmed, is plausible or is busted (not true).

For our first myth buster exercise, let’s tackle the “email is dead” topic. In late March, the Boston Globe published a piece titled “E-mail a thing of past for business, young.”  The gist of the article was that people are turning away from email. It cited a Radicati Group study that found a 9.5 percent decrease in email traffic between 2010 and 2012, and identified people who rarely use email (students, venture capitalists and researchers) – some went as far as to no longer include an email address on their cards or websites.  The reasons for the decline in email were: 

  • Too much spam,
  • Email is too limiting,
  • It is the natural progression of communications channels (think the death of fax),
  • The emergence of better two-way communications channels like Twitter, Facebook and corporate chat technology.

But for the digital marketer is email truly dead? We say no… busted! Here are the facts:

  1. Measured by sales per dollar spent, email outperforms social-media advertising 3-to-1. According to the Direct Marketing Association, email generates $39.40 in sales for every $1 spent.
  2. 60 percent of shoppers visited a physical store because of email, according Wanderful Media based on its survey of 1,000 US consumers.
  3. 91 percent of consumers check email daily, up from 85 percent in 2008 according to data from ExactTarget. Yet this usage is evolving, as fewer consumers rely on email for personal communications.
  4. Most of the $690 million raised by the Obama campaign was generated by fundraising email.The campaign would test multiple drafts and subject lines—often as many as 18 variations—before picking a winner to blast out to tens of millions of subscribers.
  5. There are more opens on mobile devices than either web-based or desktop clients. According findings from email specialist Return Path, which reports that mobile open share has increased 300 percent since 2010, and shows no sign of slowing, with four out of 10 emails sent being read on a mobile device.

While email can be an effective tool, marketers need to effectively use the channel, which means well tested campaigns, and thoughtful messages and offers. A Pardot survey reports that while 70 percent of marketers do not consider email marketing to be a primary lead generation tactic, they are using it for lead nurturing, targeted messaging and message testing. 58 percent of respondents test to see what type of content results in the best click-through-rates; 57 percent test for a correlation between subject lines and open rates; and 46 percent test to see how open rates are affected by time of day. Highly targeted digital marketing is so much easier to do via email than any social channel.

Did we convince you? Did we bust this myth? Is email dead, or to steal a line from Mark Twain, are rumors of its death greatly exaggerated?

05/09/2013

Big Data Silos

Silos

By Josh Hemann

Whew…  A lot of folks at FICO are decompressing after an exhilarating week at FICO World, where among other things, we announced our new cloud computing environment.  This environment will enable a platform-as-a-service for our breakthrough offering in decision management

A few days before these announcements, I had the opportunity to speak on two panels at the SOCAP Symposium 2013 conference. (I wrote about SOCAP previously in the post Big Data in the Service of Customers.) At the Symposium, I got to be immersed for a few days with customer care experts from some of the most well-known retail and CPG companies. A lot of the context was from the perspective of these business’s call centers, being on the front line for monitoring the voice of the customer and the customer experience.  Of course, the term call center may be a bit anachronistic given that these centers now have to deal with email, Facebook and Twitter, but this is still the name used in the industry.

I came away from this conference having a richer appreciation for the role call centers play in the health of customer-focused businesses. This is somewhat embarrassing to admit, having worked in retail myself, but that embarrassment puts focus on how compartmentalized the view of the customer can be. Allow me to explain by sharing some of my own retail experience…

I worked on a marketing team, and the lens with which we viewed customers focused on transaction data: customers were defined by what they bought and to a lesser extent by where they bought. The marketing team was in a constant cat and mouse battle with the merchandise team, since we were incentivized by different metrics. We would work on a new campaign strategy to drive traffic and offer redemption rates. The merchandise team, fearing too many offer redemptions that were too rich, would adjust prices and inventory to protect their key metric, margin rates -- which would of course affect conversion of customer traffic into sales. So, suffice it to say that there was tension between these two teams at times.

At SOCAP, I learned about another group that was driven nuts by marketing: the call center. The call center often has to deal with marketing-campaigns-gone-wrong and misleading advertising, while at the same time interacting with the business’s most engaged and loyal customers. And these days, call centers can utilize more advanced analytics than marketing departments do, even considering the latter’s use of segmentation analyses and propensity models. For example, Natural Language Processing techniques are used for algorithmically characterizing voice and text data for sentiment, entities and topics; optimization methods are used for call and service routing; predictive models are used to preempt problems, such that the business anticipates customers’ needs rather than just reacting. Impressive stuff.

In my retail experience though, the call center was located in a different building from marketing. We hardly interacted, and we certainly did not share data or the mathematical characterizations of it. Speaking with others at SOCAP, my experience was commonplace: the call center, despite its position of having its finger on the pulse of the customers, was not integrated with the rest of the customer-focused parts of the business in ways you’d expect. In my role working with large retail and CPG companies today, I see this same anti-pattern over and over. Considerable time and money are spent overlaying customer transaction data with demographic information, but I have yet to see a marketing team overlay customer transaction data with voice-of-the-customer information from its own call center. And multiple internal teams fight turf wars over who “owns” the customer.

So, dear reader, help me out here. Do you have experience with call centers and other parts of a retail business working together closely and sharing data, just like the FBI and CIA do? I’d love to hear about your experience.

05/06/2013

Your Content Guide to FICO World 2013

FICO World Small

FICO World 2013 came to a close in Miami last week, here is a detailed breakdown of where you can find the best videos, announcements, pictures and session presentations from the event:

Announcements

Here is a listing of what we announced last week:

FICO World Video

Visit the FICO World 2013 tab on the FICO YouTube channel to see all the videos. The highlights include:

Pictures

Whether you participated in the 5K or danced the night away at the South Beach party, feel the energy of the FICO World at our photo gallery. Maybe you can even spot yourself in one of the pictures!

Session presentations

Over the next several weeks we’ll be uploading the presentations from our more than 80 sessions to our Slideshare. If there was a session that you loved or a session that you missed and wanted to catch up on, you can find the links to the presentations at http://www.ficoworld.com. The presentations are only available to attendees, so if you attended FICO World, you will receive a password to access them.

Search Site


FICOLabsBlog.fico.com

Subscribe

Enter your Email

Preview | Powered by FeedBlitz