Brad Feld and I Discuss Data

What do you do when you have to make decisions in an uncertain environment with only mediocre data?  Startup founders and investors face this question all the time.

I had an interesting email exchange on this topic with Brad Feld of Foundry Group. First, let me say that I like Brad and his firm.  If I were the founder of a startup for whom VC funding made sense, Foundry would be on my short list.

Now, Brad has an Master’s in Management Science from MIT and was in the PhD program. I have a Master’s in Engineering-Economic Systems from Stanford, specializing in Decision Theory.  So we both have substantial formal training in analyzing data and are both focused on investing in startups.

But we evidently take opposing sides on the question of how data should inform decision-making. Here’s a highly condensed version of our recent conversation on my latest “Seed Bubble” post (don’t worry, I got Brad’s permission to excerpt):

Brad: Do you have a detailed spreadsheet of the angel seed data or are you using aggregated data for this?… I’d be worried if you are basing your analysis… without cleaning the underlying data.

Kevin:  It’s aggregated angel data….   I’m generally skeptical of the quality of data collection in both… data sets…. But the only thing worse than using mediocre data is using no data.

Brad: I hope you don’t believe that. Seriously – if the data has selection bias or survivor bias, which this data likely does, any conclusions you draw from it will be invalid.

Kevin: …of course I believe it….  Obviously, you have to assess and take into account the data’s limitations… But there’s always some chance of learning something from a non-empty data set.  There’s precisely zero chance of learning something from nothing.

Brad: … As a result, I always apply a qualitative lens to any data (e.g. “does this fit my experience”), which I know breaks the heart of anyone who is purely quantitative (e.g.
“humans make mistakes, they let emotions cloud their analysis and judgement”).

I don’t want to focus on these particular data sets.  Suffice it to say that I’ve thought reasonably carefully about their usefulness in the context of diagnosing a seed investment bubble.  If anyone is really curious, let me know in the comments.

Rather, I want to focus on Brad’s and my positions in general. I absolutely understand Brad’s concerns.  Heck, I’m a huge fan of the “sanity check”.  And I, like most people with formal data analysis training, suffer a bit from How The Sausage Is Made Syndrome.  We’ve seen the compromises made in practice and know there’s some truth to Mark Twain’s old saw about “lies, damned lies, and statistics.” When data is collected by an industry group rather than an academic group (as is the case with the NVCA data) or an academic group doesn’t disclose the details of their methodology (as is the case with the CVR angel data), it just feeds our suspicions.

I think Brad zeroes in on our key difference in the last sentence quoted above:

…which I know breaks the heart of anyone who is purely quantitative (e.g.
“humans make mistakes, they let emotions cloud their analysis and judgement”).

I’m guessing that Brad thinks the quality of human judgement is mostly a matter of opinion or that it can be dramatically improved with talent/practice.  Actually, the general inability of humans to form accurate judgements in uncertain situations has been thoroughly established and highly refined by a large number of rigorous scientific studies, dating back to the 1950s.  It’s not quite as “proven” as gravity or evolution, but it’s getting there.

At Stanford, I mostly had to read the original papers on this topic.  Many of them are, shall we say, “difficult to digest.” But now, there are several very accessible treatments.  For a general audience, I recommend Daniel Kahneman’s Thinking Fast and Slow, where he recounts his journey exploring this area, from young researcher to Nobel Prize winner.  For a more academic approach, I recommend Hastie’s and Dawes’ Rational Choice In an Uncertain World. If you need to make decisions in uncertain environments and aren’t already familiar with the literature, I cannot recommend strongly enough reading at least one of these books.

But in the meantime, I will sum up.  Human’s are awful at forming accurate judgements in situations where there’s a lot of uncertainty and diversity (known as low validity environments).  It doesn’t matter if you’re incredibly smart.  It doesn’t matter if you’re highly experienced.  It doesn’t even matter if you know a lot about cognitive biases.  The fast, intuitive mechanisms your brain uses to reach conclusions just don’t work well in these situations. If the way quantitative data analysis works in practice gives you pause, the way your brain intuitively processes data should have you screaming in horror.

Even the most primitive and ad hoc quantitative methods  (such as checklists) generally outperform expert judgements, precisely because they disengage the intuitive judgment mechanisms. So if you actually have a systematically collected data set, even if you think it almost certainly has some issues, I say the smart money still heavily favors the data rather than the expert.

By the way, lots of studies also show that people tend to be overconfident. So thinking that you have a special ability or enough expertise so that this evidence doesn’t apply to you… is probably a cognitive illusion too. I say this as a naturally confident guy who constantly struggles to listen to the evidence rather than my gut.

My recommendation: if you’re in the startup world, by all means, have the confidence to believe you will eventually overcome all obstacles. But when you have to make an important estimate or a decision, please, please, please, sit down and calculate using whatever data is available.  Even if it’s just making a checklist of your own beliefs.

3 Comments

  1. I agree with Brad here. Some potentially biased data is worse than no data at all. Similar to how some knowledge can be far more dangerous than no knowledge. This biased data may lead us to become overconfident with the wrong conclusions. Then we continue to build our mental models off these mistakes.

    Zero data = zero value = zero bias. Biased data = (uncertain and consequently) negative value. (Unless you ignore it, and then it also has zero value)

    A checklist may be simple but assuming good methodology, it will be unbiased and good data that has positive value.

    (I find the word mediocre slightly confusing in this context, so I’ve chosen not to use it.)

    1. The choice is not some knowledge versus no knowledge.

      According to the research, you only have two choices: (1) force your slow, logical system to draw conclusions by carefully working through the problem with data or a checklist or (2) let your fast, intuitive system jump to conclusions based on impressions. You cannot stop (2) from happening.

      So you think (2) is better than (1)? Despite the fact that there’s a mountain of evidence saying that in this environment, it’s not?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>