Franken-measures...or How to Construct a Useful Composite Measure

Franken-measures

Sometimes a simple metric isn’t enough. It can’t fully describe a behavior or performance of a system. That’s when you need a Franken-measure: a made-up metric monster that creates a comprehensive composite to capture complex concepts.

Franken-measures go by many names—indexes, scales, ratings, composite or compound measures—and show up in all sorts of places:

Web analytics has an ongoing discussion about a measure of visitor engagement; the famous Google PageRank measures the “importance” of sites using a complex and mysterious algorithm.

Sports have embraced Franken-measures to evaluate player and team performance, e.g. passer ratings, Rating Percentage Index for college basketball, and judging of Olympic events like gymnastics, ski jumping, and ice dancing.

Economists loves indexes, e.g. Consumer Price Index, Consumer Confidence Index, Gross Happiness Index.

Marketers use “scores” to simplify their lives, e.g. Q scores measure the familiarity and appeal of popular culture entities and credit scores judge your value as human being.


Why would I want a Franken-measure?

You are probably already up to here with measures, so why would you want another one—much less one that is going to need extra effort and explanation? Here are a few things Franken-measures can offer:

A short-hand way to communicate about a complex concept. For example, a concept like customer loyalty may encompass everything from share-of-wallet to frequency of interactions to average sales amount.

A mechanism to operationalize a complex concept. Systems can take action on a single number more easily than an array of variables.

A definitive weighting of factors. Rather than constantly bickering about the relative importance of various measures, a Franken-measure can lock down the weighting, avoiding individual biases (in exchange for a systematic bias).

A balance of components. By combining multiple measures, variation in one measure doesn’t unduly bias the results.


What does it take to design an useful Franken-measure?

Not all Franken-measures are effective at achieving these benefits. There are at least four elements that contribute to a good design: completeness, concision, measurability, and independence. These factors can be combined into the Franken-measure Effectiveness Index (FEI) using Juice’s proprietary weighting model.

Completeness. Modeling all relevant performance factors to provide a holistic measurement of the concept.

Concision. A calculation that is as simple and straightfoward as possible, making it understandable and logical to users.

Measurability. Using direct performance data rather than relying too heavily on proxies or subjective measures. And from a practical perspective, if you can’t reliably gather valid data, the exercise is futile.

Independence. The components of the measure need to be independent so that variation in one component doesn’t directly drive another.


What can go wrong?

Finally, here are a few of the pitfalls to avoid when setting out to create your perfect Franken-measure:

Complexity. A complex calculation can confuse and infuriate your audience because it is hard to understanding what is driving performance and why the measure is moving. Leigh Steinberg, famous NFL agent, said of the NFL passer rating: “Other than one attorney in our office, I am unaware of a single human being who has the capacity to figure a quarterback rating.” The formula isn’t quite as inpenetrable as that, but it isn’t for the weak of heart:

passer rating

Changing the baseline. There will be inevitable pressure to change the franken-measure formula which automatically invalidates historical performance.

In search of comprehensiveness. A desire to be comprehensive can hamstring the effort. Take Eric T. Peterson’s Engagement Model. He is clearly striving for completeness but at the risk of feasibility, in my opinion.

Eric T. Peterson's engagement metric

Black box and credibility. For the people impacted by a Franken-measure, it is important to understand what is going on under the covers. And if it is impossible to share the algorithm or approach, credibility of the creator is all that remains. PageRank succeeds to the extend that people trust that Google has an objective, well-intentioned algorithm. A whiff of agenda or bias would undermine it in the eyes of the audience. Take the National Review’s “Liberal Rankings” which have managed to label the last two Democratic Presidential nominees as the “Most Liberal Senators.” Coincidences like that can undermine credibility.


For more information:

4 comments


April 13, 2008
Eric T. Peterson said:

Zach,

Got your email, thanks! I guess I understand what you're saying about Tufte's mastery of Adobe Illustrator but I suppose we'll have to agree to disagree on this point. Having done web analytics for a little while I have learned that there is simply no substitute for having the right tool for the job.

If you need to have excellent, beautiful graphs, you need to get AI and learn what Tufte already knows. If you need to make a nominally complex calculation based on multi-session visitor behavior, you need something more powerful than Google Analytics, HBX, or ClickTracks.

It's that simple.

Now, I certainly don't disagree with the vision of the engagement calculation available everywhere --- don't get me wrong! I'd love it if The Engagement Project were so successful that vendors large and small immediately deployed the metric as "standard" in their applications so that everyone could benefit from this new way of thinking ... but we're certainly not there yet so for the time being, visitor engagement (just like bounce rate, real visitor segmentation, and complex attribution models) will be available to some but not all.

Anyway, I hope to see you and Chris at Emetrics so we can continue the conversation. FYI, Carrabis, Gary Angel and I will be giving a presentation on this exact subject so hopefully you guys will be there to root us on.

All the best,

Eric T. Peterson
Web Analytics Demystified, Inc.
http://www.webanalyticsdemystified.com


April 13, 2008
Eric T. Peterson said:

Zach,

Interesting post. Thanks for including the engagement framework with other incredibly valuable and well known measures like PageRank, Consumer Confidence Index, and the all important Quarterback Rating!

Up until recently I would say "my framework is not worthy" but you may have noticed that Joseph Carrabis of NextStage Evolution has offered to help refine the mathematics to make it as complete, concise, measurable, and independent as possible. To this end we've established something I call "The Engagement Project" which we would love you guys to participate in if you're interested.

Our goal is to define a practical, extensible, and "extendable" measure of visitor engagement online, something as comprehensive as what I've described today yet mathematically precise. I love Joseph for this work as his credentials are impeccable.

One thing I suppose I do disagree with in your post above is the "feasibility" of the calculation I have described, but perhaps I don't understand what you're saying. Some folks have commented that they don't like my framework simply because you cannot make the calculation using Google Analytics, ClickTracks, etc.

I see this as kind of a weak argument --- there are obviously different levels of technology at our disposal today, some far more powerful than others. To say that this calculation/framework is impractical simply because a company doesn't have the right tool for the job is like complaining that you're unable to make a visually rich graph using a TI-81 calculator ...

The argument is similar to saying that "bounce rate" is impractical because a handful of popular applications still don't report on this very un-Franken-metric. Few would argue the utility of bounce rate, yet the feasibility of the metric depends 100% on which application you've deployed.

While the mathematics are getting a well-deserved refinement by Mr. Carrabis and others, the reality is that powerful tools like Visual Site, Coremetrics, WebTrends, and IndexTools are all capable of making the "Franken-measure" pretty much exactly as I have described it. Feasible, possible, and happening as we speak in some very large companies.

Anyway, I do consider it quite an honor to be cited in your blog and would be very excited if you guys would like to join Joseph and I in our work.

See you in San Francisco!

Sincerely,

Eric T. Peterson
Web Analytics Demystified, Inc.
http://www.webanalyticsdemystified.com


April 13, 2008
Jeff Hammerbacher said:

Hey Zach,

Great post as always. Having looked at composite measures in finance and now the web, I'd like to put extra emphasis on the "Black box and credibility" component.

It's important with a composite measure to get a good feel for its behavior under various states of the world. Having absolute transparency back to the data source for each measure is critical to develop this intuition.

I'd also add that producing easily understood examples of different factor levels and showing how they are scored by the composite measure will help develop intuition and confirm the utility of the single measure.

Regards,
Jeff


April 13, 2008
Zach said:

Eric, Thank you for the invitation to participate in "The Engagement Project." We'd love to be involved. I think it is great that there is some momentum behind this idea. I tackled it a few years back at AOL for some internal reporting and wondered at the time why the industry hadn't standardized. A little naive, I'm sure.
As for my comment about feasibility risk, I should probably wait to see how things evolve. However, I disagree with the assertion that the limitations of tools are unimportant. By analogy, the beautiful, multi-dimensional charts that Tufte likes to create in Adobe Illustrator are generally out of reach for ordinary analysts (both due to software and skills). His principles are valuable; his approach is impractical for everyday application. A central challenge for the engagement measure, in my view, is to find the right balance between reaching and persuading a broad audience while not sacrificing the core goals of the measure. I look forward to the continued discussion.
Jeff, I love your idea of demonstrating how different factor levels can impact the result.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Analytics Roundup: Expensive cup of Joe-l

On the Fahrenheit scale, do 0 and 100 have any special meaning
The story of a mixed up metric.

At Last, a $20,000 Cup of Coffee - New York Times
Monstrous $20k coffee brewing system for fanatics, err, I mean, purists.

Five whys - Joel on Software
Incredible blog on system uptime, SLAs, rdiculousness of "Six 9's", black swans, and how superbly FogCreek Software handles customer service issues.

Browser History Timeline
Chronicle of the lives of six popular Web browsers.

0 comments | Add a comment

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment






TV Ratings and Online Audiences... Or, Where to Find Skeet Ulrich's Bio

The TV ratings system is broken. Everyone knows it, but nobody wants to admit it. Nielsen ratings struggle to accurately measure audience quantity (limited tracking of DVR usage and online viewers) and quality (are viewers engaged? are they skipping the ads?). However, admitting so would undermine the delicate balance TV networks share with their advertisers.

I caught an interesting segment on KCRW's "The Business" podcast about TV series that find themselves on the "bubble," i.e. at risk of getting canceled. The producer of CBS's Jericho, "a post-apocalyptic drama starring Skeet Ulrich" (shouldn't that description alone put it on the chopping block?), explained how they received a temporary stay of execution when their small but loyal audience protested network plans to cancel show. The interview raised questions about the validity of Nielsen ratings and how an fervent online audience can bring additional perspective to the performance of a show.

All this talk of measurement gave me an itch to look at some real data. I tracked down the Nielsen audience size (Subscription required) for TV series over the 2006-2007 TV season. Then I pulled from comScore (a Juice client and leading source for data about Internet traffic and usage behaviors) the unique visitors and time spent on websites of TV shows over the same September to May time period.

I had a few questions I was curious about:

  1. Which shows have dispropotionately larger internet audiences—an indicator of a loyal and rabid fan base? Are there other shows like Jericho that struggle to build a large TV audience, but have a strong online following?
  2. Which TV show sites have the most engaged audiences?
  3. What TV networks have been most successful at building online traffic to their sites? Which types of shows spawn online audiences?

The table below shows the top 20 TV series by ratio of monthly unique website visitors to average TV viewership. This metric suggests an ability to get viewers to look for more content, whether it is additional video, information about the actors, or discussion boards. If Jericho's 9.5 million TV viewers (tied for 48th overall) represents the proverbial bubble, there are eight other shows with bubble-level ratings that can also claim strong online support (highlighted in this list).

Ratings Table 1

I also wanted to get a sense as to the engagement of the online audience. Were people simply stopping by the website to check the TV schedule, or were they digging deep for more content? One measure that gets at this question is minutes per unique visitor. The top 20 websites are listed below. Interestingly, 12 of these sites are also found in the previous table. Jericho is one of four of the bad-Nielsen-ratings/strong-online-audience group that overlap with the table above. (NBC, if you are grousing about ratings for The Office, hopefully these numbers will make you feel a little better.)

Ratings Table 2

The final table addresses my third question about the TV networks and types of shows that are best at building an online audience. ABC has done more than twice as well as CBS in getting viewers online, which may be a reflection of the traditionally older CBS audience. Note: I pulled the top-end outliers (American Idol, You Think You Can Dance?, and Deal or No Deal) from the Network comparison.

The second half of the table brings those TV series back into the mix in the reality/contest category, and you can see the impact. I was surprised at the dearth of sitcoms on this list. It may be that a website for a sitcom doesn't typically make sense.

Ratings Table 3

With all the money spent on TV advertising, I can only hope the networks go beyond the top-line Nielsen ratings to try to get a complete picture of their audiences.

15 comments | Show all comments only the last 5 are shown


July 28, 2007
Hadley Wickham said:

In the first table, the second column is labelled "Website audience / TV audience", but the values in the columns are percents. This doesn't make sense to me- does 5.5% mean there were 5.5 times as many web viewers as tv viewers, or only 5% of the number of tv viewers used were website viewers? It's a big difference!

A scatterplot of web audience vs tv audience would also be useful, especially if supplemented with some reference lines (eg. 2x 5x 10x)


August 2, 2007
Paul Robinson said:

Just out of curiosity, why did you ignore Deal or No Deal in your conclusions? It has by *far* the biggest gap between Nielsen and website audience and it has the longest avg visit time online - yet you don't refer to it once.

I also agree with Hadley - you've spent time putting this stuff together, which is great, but you've not explained what the figures actually mean. Tufte would be ashamed of you! :-)


August 2, 2007
Zach said:

Hadley, You are correct in pointing out that I incorrectly used percentages when it isn't truly a percentage. The metric is intended to show the size of the online audience relative to the TV audience -- but it isn't as if one is truly a percentage of the other. 5.5% represents the ratio of one audience to the other (as shown in the column header). I find it a stretch to interpret 5.5% as 5.5x.
Paul, Good observation. I had suspected that "contest shows" like Deal or No Deal or American Idol drive traffic to their site by getting people to vote online or play an online version of the game (or look at photo galleries of the Deal models in skimpy dresses). In that sense, I was more interested in talking about shows that seemed to be creating loyal audiences through the characters and content of the show.


August 26, 2007
Jennifer Reed said:

I was a Nielson TV home. The amount of equipment that had to be placed in and on all my tvs, vcrs, video games etc. sucked. But overall it was kind of cool. Shows like House, Dateline NBC, and the entire cartoon network were watched. I have a large family and we made sure we watched television of substance not like the crap with Paris Hilton. It is kind of cool to feel you have a say in whats good tv. I did this for a few years until I moved. There was no money paid to participate except $30.00 every six months to cover the electric all the annoying equipment used. Furthermore, they wanted us to be very secretive and completely accurate in what we watched, advising us not to use the tv for company noise, etc.. Nielson, to me is very competent in how they research who watches what . They once even called me because the tv was on for several hours on the same channel and wanted to know why. It was because the kids were sick and watched cartoon network all that day. Please, I would not doubt Nielson, they are going to be the most accurate you could get unless you monitored every home in the entire world.


August 26, 2007
Zach said:

Jennifer, Thanks for sharing the details of the Nielsen family experience. I've always wondered what exactly was involved. My concern isn't whether they do what they set out to do well...it is that they don't attempt to capture the full picture. With DVRs/TiVos and online viewing, the outside-the-living-room picture is becoming increasingly relevant.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Choosing the Right Metric

Misaligned goals, distorted behaviors, and a misguided sense of success... no, I'm not referring to college graduates. I'm talking about the problems caused by using the wrong metrics in your organization. You've probably seen examples like tracking average customer profitability and losing perspective on the variance in profitability or evaluating customer service reps on calls handled without regard for the quality of the experience. I'd like to offer up a quick-bake recipe for choosing the right metric.

Step 1: Set the context

Metrics generally serve one of two purposes. Start by understanding what you are trying to achieve.

1. Identifying problems. Defining the right metrics in this case requires you to do a little detective work: What is the data residue of a problem? What evidence can be found and how exactly does it show up?

2. Measuring performance. The right success metrics need to focus on measures that can be controlled and where improvement in the number is unabiguously a good thing.

Step 2: Balance the four dimensions of a good metric

Metrics Framework

Lots of metrics fail in at least one of these dimensions. A few examples:

  • Common interpretation: We had a client who made a distinction between "leads" and "prospects" in their marketing organization. Prospects had theoretically expressed more interest in the service through their actions. Unfortunately the line between leads and prospects was always hard to decipher and the definitions were hard to communicate. On a related note, we got a kick out of Tom Davenport's (author of "Competing on Analytics") assertion that a company competing on analytics needs to "invent proprietary metrics for use in key business processes." There is nothing inherently wrong with "invented proprietary metrics" but it sounds like something that is designed to confuse anyone outside of the inner sanctum.
  • Actionable: Metrics are frequently too broad for the impact that a particular group can have. Customer satisfaction is a popular dashboard staple, but it is hard for most managers to see how they can have a significant impact on the number.
  • Accessible, credible data: Sometimes the most valuable and obvious metrics are frustratingly hard to track. In the web analytics world, unique visitors is important to know, but user deletion of cookies has thrown a wrench into the works.
  • Transparent, simple calculation: Top NFL agent Leigh Steinberg says of the famous quarterback ratings metric:"Other than one attorney in our office, I am unaware of a single human being who has the capacity to figure a quarterback rating." I don't know what kind of art majors he hires, but all they need to do is use the simplified formula: (83.33 * Comp %) + (4.16667 * Yds per att) + (333.333 * TD pct) - (416.667 * INT pct) + 25/12.

(Want a little validation of this framework? Avinash, respected web analytics guru, just published a post with "Four Attributes of Great Metrics" and he landed on a strikingly similar set of four: 1) instantly useful (i.e. actionable); 2) relevant (i.e. common interpretation); 3) timely (i.e. accessible); 4) uncomplex (i.e. transparent and simple).)

Step 3: Avoid the metrics bugaboos

Finally, here are a few traps that I've seen in deciding on appropriate metrics:

  • Trending and distributions: Don't always try to compress a metric into a single number. Often it is more revealing to show the metric across time or as a distribution to uncover variance.
  • Edge cases: There will always edge cases where a metric may not mean what you think it means. These situations are worth understanding, but you shouldn't allow the perfect to be the enemy of the good.
  • Setting goals: Could you hold someone accountable for this metric without them throwing out a half-dozen reasons why it doesn't make sense? It's a decent test of the value of the metric.
  • Self-serving: Be careful that you don't select metrics simply because you know they'll make you look good.


10 comments | Show all comments only the last 5 are shown


July 7, 2007
Henk said:

Finding the right metrics (or KPIs) to measure performance or to identify problem areas for an organisation is THE challenge, indeed. On the highest level, they are usually too abstract to be meaningful (actionable), and drilling down may easily let you get lost in a sea of details, losing to see the forest for the trees. This article nicely summarizes the problem and points into the right direction for analysis. Well done, Zach. We need you!


July 9, 2007
Darius Wiles said:

If you are interested in this article, you may want to take a look at Andrew Jaquith's book, "Security Metrics: Replacing Fear, Uncertainty, and Doubt". It was recommended to me but I've only just started reading it so haven't drawn my own conclusion yet.


July 13, 2007
Ben Yates said:

Your blog is great, but your navigation links don't work (Firefox, Windows XP). Diminishes your credibility, which rests on being uber-cool tufte-style usability geniuses.


July 14, 2007
Jeff said:

I've got FF & WXP here, along with the rest of my office. Links work fine.


July 16, 2007
Eduardo said:

He might be referring to the "Previous" and "Next" article links at the bottom of the writing. Those both link back to this page instead of the previous and next articles like they should. Not credibility diminishing in my eyes, but a smidge of an inconvenience.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Esurance--Competing on Analytics

Recently I caught up with my college friend John Swigart who now runs the marketing organization at Esurance. When the conversation inevitably drifted to business, I asked about how Esurance was using data to make decisions. I was expected to hear the same old story—big failed data warehouse projects, piles of underutilized reports, frustration about not being able to understand how the business was performing. I was way off.

It seems that John works for the rare company that has managed to live the analytics dream. Esurance competes on analytics—not in the idealistic model highlighted by Tom Davenport, whose "full-bore" analytics competitors are defined by:

"Top management had announced that analytics was key to their strategies; they had multiple initiatives under way involving complex data and statistical analysis, and they managed analytical activity at the enterprise (not departmental) level...

...Employees hired for their expertise with numbers or trained to recognize their importance are armed with the best evidence and the best quantitative tools. As a result, they make the best decisions: big and small, every day, over and over and over."

That's window-dressing. John didn't make any grandiose pronouncements of Esurance's analytical achievement or talk of the best tools and most complicated models. He simply stated that data-based decision-making has been a part of the culture from the very beginning and he considers it essential to running a smart business. A few points that he emphasized:

  • Clear linkages between metrics. There needs to be a well-understood hierarchy that has important financial measures at the top (i.e. revenue) and connects to the underlying drivers.
  • Frequent reviews of reporting. Senior managers get together on a regular basis to look through the core reporting. These meetings are detailed, but somehow useful enough that people stay committed to the process.
  • Learning takes time. John recognized that Esurance cound not be as evolved in their understanding of the business without a commitment to this approach from the very beginning.

After getting off the phone with John, I asked him to respond to a few questions so our readers could get a taste of their approach:

How has Esurance managed to develop a culture that embraces decisions using data?

We don't make decisions based "I think we should this." We look at data to find out what we know, then decide what to do based on the facts. We identify expected outcomes up front and determine how we are going to measure the change before we implement something. Also, a data-driven culture starts at the top of our organization.

What processes do you have in place to get the right data in front of the right people?

We have centralized data warehouse and reporting structure. Everyone gets their data from the same place and the metrics are universal. This took 3-4 years to get it right, and we built it from scratch. It takes a substantial commitment to pull off.

What is the role of the analyst in your organization? What tools do they use?

We have technical analysts and DBAs in our business intelligence group that deal with the more technical issues. In Marketing, then, we have analysts how are on the individual marketing teams that work closely with the business people. The use some basic tools, nothing terribly fancy.

From an analysis perspective, what do you do when you are testing new marketing opportunities?

All tests are done with as much of a controlled environment as possible. With so many moving parts, this can be difficult, but is important.

How has analytics contributed to the success of Esurance?

Truly one of our competitive advantages. We would not be where we are today without great data and a dedication to using it through all levels of the organization.

2 comments


March 13, 2007
James Taylor said:

I'm curious - how does this kind of analytics mesh with Esurance's risk modeling and other forms of predictive analytics? Same group, same process or something different?


March 13, 2007
Zach said:

I didn't get into that area specifically, but John did say that Esurance builds almost everything in-house -- including their semi-controversial ad campaign (Slate pans it here http://www.slate.com/id/2153173/ , but I personally find it memorable).

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Earlier writing