Paul Willard

Paul helps companies find balance between what the user needs, the business wants and what resources allow. A keen eye for young companies with huge potential, Paul focuses on mentoring across product, engineering and marketing efforts.

The Subtraction Capital Investing Thesis

At Subtraction Capital, we invest in companies we would work for.  Literally, if the fund did not exist tomorrow, Jason or I (usually both though) need to be able to say that we would be totally excited about being an employee at a company for us to be excited about investing there.

There are a few reasons we take this approach.  First, the companies that we went to pretty early did quite well.  If you thought it was luck, then it was luck three times in a row each, with no fourth time that was not lucky.  The odds are so astronomical that even a statistician would probably say that the probability that it is luck is exceedingly low.  In addition to the places we did go to work, we each had a very short list of companies that we went deep in the interview process with and seriously considered but passed on for some reason, often having nothing to do with whether we though it would be successful or not.  And nearly all of those also have also done exceptionally well.

Which leads to the second reason we invest in companies we would like to work for.  The act of selecting a place to work involves a whole bunch of rational, front-brained, cognitive criteria, but it also involves application of our more powerful, subconscious processors.  This is what we are employing when we say things like “listen your gut.”  So I make my Ben Franklin list, but then I Subtract the noise away so that I can hear the voice of the big cpu in the subconscious.

Finally, most of the time we discover our jobs through referrals from our networks and all of the Subtraction Capital partners have multiple deep, layered networks including the most famous network of all, the “Paypal network.”

Let me run through some of the more important criteria for Jason and I as job seekers:

Location - we would only take a job at a company that we could show up at.  So for us, that means Silicon Valley or Salt Lake City (since Jason lives in Park City, Utah and splits time there and in San Francisco).

Subject/Sector/Space – we would only work at a company in a sector we personally cared about deeply.  So much of our limited time and energy go into the place we work; we need to absolutely love what we are doing in order to give everything to the startup, which it deserves.  Sometimes we see companies in spaces that are hot and popular, but if neither Jason nor I are personally excited about the space or subject, we just can not invest there.

Broadly, we are both interested in the Enterprise/B2B software space vs consumer today, though that could change at some point and we both worked in the consumer space as well as the enterprise space in our operating careers.  The Enterprise/B2B space simply feels more deterministic to us, so we believe it has greater potential for us to generate the best returns for our LP’s.  I like to say that the consumer space feels like making movies, you can make one that is critically acclaimed and award winning that does not make money, but to do so would be a disservice to our LP’s.

Some sectors we are definitely interested in are ones we have already worked in, like Health Tech, Fintech, Data/Analytics and Marketing/Sales modernization.  There is a Warren Buffet axiom that there is a sucker at every poker/business table and if you don’t know who it is, it is probably you.  So to responsibly put our LP’s capital to work, we feel like we need to do so in spaces where we have a relatively deep understanding.

One interesting thing is that many spaces about to be disrupted might not have anyone yet with the deep knowledge of both the vertical and software disruption.  So the initial team that wins will be a cooperative effort of both.  Because of this, we expect to very slowly be adding new sectors within enterprise software moving forward; likely ones that take advantage of our skillset, experience, and interest.

Scale – it takes a very large endeavor to shift the axis of the earth a little bit, and we definitely want to be on a team that shifts the axis of the earth.  A starting point here is the ability to have a very large top line run rate.  $100m is the smallest of “very large” for software companies in my opinion.  And lately, I have seen a lot of funds setting criteria a factor higher than this in order to account for overestimation.

Team – it would be impossible to join a company where I did not want to work with the team.  I need to feel that the team is extremely capable, fun to work with, hungry, and just stubborn enough.  Winning at a startup means you have to believe you can succeed where many others have failed before and that is a little bit irrational and requires some confidence, ambition, and enough passion to derive a lot of hope.  That hope is what drives you to keep pushing harder through the difficult times, which is most of the time.  Also the market will teach a winning team constantly so long as the team has the ability and desire to learn.  This balance of ambitious stubbornness and pragmatism to respond to the market is exceptionally important in order for startups to succeed.

Mutual desire – I would never want to work for a company that was not really excited about having me there.  And no great company would want to hire me if I was not truly excited about the Mission of the company: Mission Motivated.  I am also happiest when I am truly useful.  Useful = Happy.

Entrepreneurs have unprecedented access and choice in early stage investors.  And if an investment is like a marriage, only more permanent, then the early stage investors will be your spouse for the longest time.  So hire the investor you would most like to hire on their team.  An investor is one of the most important hires you will make.  An investor should be someone you want to enjoy the long difficult, insanely fun, and rewarding journey with for the long haul, through the ups and downs.  Jason once said at a conference “find an investor that believes the world needs your company as much as you do”  and I fully support this notion.  Whatever you do, hire your investors for sure.

The Zen of Hiring and Retaining Top Talent in Silicon Valley - Part 2

In Part 1, I reviewed a general philosophy that starts a framework for managing very talented folks in Silicon Valley.  Remember, this is the Big Leagues of tech, and if you want your company to grow fast you need to attract and retain the All-Stars of the Big League!  In this post, I will go over some specific tactics that I have used to successfully support that attitude.

Strength training - A manager should let people develop their strengths and apply them to the company.  Many companies tell people what they are doing well and what they are doing wrong, with directive on increasing focus on things not being done well (areas to improve).  An alternative perspective would be that talented people spend their time, attention and love on things they enjoy doing and they do well at those things.  And that people neglect things they do not enjoy as much.  Rather than asking people to spend time on things where it will be difficult for them to do well, ask people to spend even more time, attention and focus on the things they love doing, get even better at those things, and maximize their value-adding potential to the company via their strengths.  Let them help identify the things they are not good at so you can help them offload those things.  They will be much happier to call out their struggles if they know it means they get rid of that work, btw.

Diversity - given every company has needs beyond the strengths of any single person, hire diverse individuals on a team so that collectively they are incredibly strong in every area that a company requires in order to win.  Another angle of this is hiring someone who is better than you at something which your company needs, and collectively your team will far surpass your ability to do almost everything.  Appreciate that diversity goes far beyond college major including age, gender, ethnicity, and anything that can provide a different perspective on problem solving.  It is very broad.

Collaboration - Collaboration is the best way for us all to learn and grow most quickly.  Both managers and individuals in their groups will learn more from collaborating together than by giving out and taking orders and everyone will have more fun doing it as well.  In a scrum style team meeting, team members can tell things they worked on and completed and what they learned from them, then what they are going to take on next (so that others can find them after the team meeting and collaborate if it is of interest to them), and then of course, places they will need help or see roadblocks, again so that teammates that can clear them efficiently or give help can find people after the team meeting and collaborate.  Everyone learns more when they are sharing their learning, and there is no better way to mastery than teaching.  After going around quickly on this update, have a different team member each meeting do a deep dive and teach everyone about something they have been working on.

Train your replacement - for one thing, there is no better way to mastery than by teaching someone else.  For another, a manager will never be able to focus on higher level, more impactful work if they are tapped all the time with their current work.  A manager should continually be offloading as much of their work as possible to their team.  This also has the extremely beneficial side affect of pushing decisions into the most efficient place possible for them to get done, by the people doing the work.

Give it all away - take absolutely everything you are responsible for today and assign it to someone in your team.  If they need help doing it, help them.  But let people own important meaty things.  Let them get the visibility and credit for their wins, but always take the blame if it isn’t a win (your prioritization was wrong, you didn’t set them up for success, etc).  Nothing makes you look better as a manager than having a team full of great people doing really important work.  There is a saying that the best way to get something done is ask a very busy person to do it, and similarly, the important jobs at a startup get done by the teams that are constantly knocking them off.  While your team is doing everything your area is responsible for, you can coordinate, run air traffic control, define success, and get new meaty jobs to do . . . which you can give away as soon as your team can handle the capacity.

Cross train - one of the best things about working at a startup is the broad responsibility and opportunity to gain experience and learn.  So in the spirit of strength training, why not amplify this benefit?  Encourage team members to cross train each other and even rotate responsibilities or trade temporarily.  In addition to happy, learning team members, a team will have more backup, bench strength and vacation coverage.

Culture Test - If someone you are considering is the first person you see when you walk in Monday morning.  Are you happy to be at work?  The answer for me always needed to be an emphatic yes.  Some people do the beer test, where you have to feel like you will enjoy going out to get a beer together after work.  But I have loved working with a ton of really great people that either didn’t drink, or would prefer to be home with their families versus having a beer with me outside the office.  I think the beer test is too limited.

Founders can get started and off the ground, but things won’t get large without a really successful team.  So be the place that really successful people want to grow their careers.  Happy growth, and happy Subtracting.

Horses Dressing Up as Unicorns for Halloween


Trick or Treat! Really the horses have been dressing up as Unicorns for a while, but Halloween just seems like a fun day to call out the emperor’s clothes for the costumes they are in many cases. Since @AileenLee @CowboyVC coined the term, everyone wants to be a Unicorn. And why not? Company press hits go up, customer acquisition gets a boost, hiring is a little bit easier, every investor knows your name, a bunch of outside money follows inside money leading a Unicorn round, possibly on even better terms … the media hoopla brings a lot of attractive benefits.

The media sprinkles so much Unicorn Dust on the anointed, that in order to help companies get the benefit, investors have been willing to get creative about finding a way to make the $1B headline work. Anti-dilution clauses will effectively re-price the current round if the next round is below some mark, or liquidation preferences will ensure that if the company sells the investors get a minimum return before any common stock gets paid out. But a $1B valuation with heavy downside protection via structured round is not equivalent to a $1B market capitalization of a public company that it is being compared to. It is more like an option on upside with a floor, a complex financial instrument which is not reported on because the terms are not disclosed publicly and it is too complicated to make a good headline. The really scary part is seeing other investors match the price without the preference, blindly following the made for media valuation and buying up common vs preferred investor shares at substantially the same price. The media is fortunate they do not have to cover those calls nor do they have to apologize after people lose their shirts for believing the hype.

I worry that so much sport has been made of Unicorn spotting, Unicorn tracking, the Unicorn Club, that it has helped fuel an overpriced, speculative, late-stage secondary, and there will be a hangover after that party. I know, I have been to that party before, having worked in tech through the dot com 1.0 pop. @JPortnoy wrote a great blog post foreshadowing the run for the door that could end that party if we can’t find a softer landing through some sanity. More importantly to me, I worry that this is all noise we should all be Subtracting as we work together to build great companies. I believe startups are at their best when they are focused on the meritocracy of good dev, good design, and good distribution. And as much as I like the whimsy of Aileen’s label, I don’t believe we should build a culture with a party around that label or any other. Media is not here to build great tech companies, so I understand their motivation to bring the noise. Perhaps it is our responsibility as tech company creators to Subtract that noise away.

Photo credit: Smart Pak

The Zen of Hiring and Retaining Top Talent in Silicon Valley - Part 1

Hiring talented people and letting them execute (and not leave) is the only path to success for a startup.  As I said in the “Subtraction = Growth” blogpost, bandwidth is the most critical resource of any startup.  A manager can not afford to waste even a tiny bit of it.  In my career, I hired more talented people onto my teams than I can mention, and while they have gone on to do amazing things after, I rarely lost them while I was still at a company.  Here is the management philosophy that I employed.

I started with an assumption that I learned at Boeing management training:

People want to be successful and help their employer succeed, therefore a manager’s job is to eliminate everything that might prevent that from happening.

This sounds very simple and obvious, but if you really believe it as a primary driver, it is powerful and will change the way you approach your work.  Managers should not need to motivate people, it is already in them.  Managers are hiring people that have been very successful at nearly everything they do, for their whole lives, and they want to continue that trend.  People want to help their company succeed too.  In addition to their own growth, they want to be associated with and meaningfully contribute to a winning company.  This is even more the case at startups, where a win makes you more attractive to your next startup and helps you build your career more quickly.  If a manager hires a Stanford Engineering Masters Grad, who had to get insane test scores on SAT’s as well as GRE’s and insane bachelors degree grades and insane grades in high school and probably had a million extra curricular activities that they worked hard for, and they failed on your team, does a manager really think it is their misfortunate coincidence that the first fail of this sharp successful person’s entire life was working for them?  It sounds crazy when I say it like that, doesn’t it?  And that is how it is.  Starting with this assumption is actually huge, it creates the risk free environment that allows your team to experiment and fail which will lead to winning.  Sean Ellis said at the Agile Marketing Meetup that there is no better indicator of growth than test velocity, and that is consistent with my own experience as well.

A corollary property is that if someone is not thriving, a manager created an environment that is preventing them from doing so.  Common ways managers block people, in order of likelihood, include the following:

  • Lack of success definition clarity: A manager does not clearly communicate what success looks like.  They might have talked about it, but did not reach true concurrence.  Note that it does not matter if the manager thinks they were clear about it, what matters is if the team member and manager are actually in concurrence.  Perception is reality.
  • Lack of prioritization: this is actually a version of not defining success well enough, but it is such an important one that it is worth calling out separately.  Subtracting is one of the more valuable things experience brings a manager.  So many things to do and so little time.  Prioritization is one of the things a manager should review with their team.  It is an efficient application of your experience so long as it is done from the perspective of sharing experience and understanding where the manager has failed in communicating success as opposed to micromanaging.
  • Senior Management Misalignment: A managers version of success is not aligned with other management’s version.  A manager’s team may not be sure which vision of success to achieve.  Do you make your manager happy at the risk of pissing off the CEO or will your manager be happiest if you make the CEO happy?  A manager should cringe at this question entirely.  It is the last thing you want your team worrying about.  Make sure you are aligned so that your team can Subtract this.
  • Role fit: what the company needs from a role is not what the individual provides successfully.  Either it is not enjoyable work to that individual (so they are not good at it perhaps) or perhaps it does not fit in with their career aspirations.  Either way, this is one of the reasons to sometimes hire contractors for skills which you are not 100% sure will be part of your long term strategy until you find out.  This one is most often the manager’s fault too since they hired the skill set and presumably they should know what was needed better than the person interviewing.

Some useful conversations between a manager and their team members will help identify areas where a manager is doing well and also where they are not doing as good a job as they might hope.  A manager is probably blocking team members from success no matter how hard they try, so why not treat discovery and elimination of these areas as a continuous improvement project for the manager?  You can do it as almost a scrum meeting format.

  • A manager can ask what a team member is succeeding at, and what is supporting and allowing this success?  Can we amplify and replicate these things?
  • A manager can ask what is progressing more slowly than hoped?  What is blocking or preventing it from looking like the successful things?  How can we work together to kill these, unblock these, or get them assigned somewhere else where they will be successful?

Applying this as a general framework, I was very successful hiring and retaining some really talented people.  And they strengthened their own careers immensely as well.  Not only were they forever associated with a company that grew very quickly, but they also got to take on as much as they could handle and sometimes more so, as a fast growing startups needs are insatiable.  Proof that it was good for the team is the fact that so many of my team members went on to run huge teams at very fast growing companies subsequent to our working together.  So much so that it was nearly impossible for me to rehire people at a next stop, unless they started in one of my teams very early in their careers.  In my second installment of this blog post, I will review some specific principles and ways to make the most of this kind of environment.

Birds and Turtles

I am a very visual thinker, so I like to organize the world into visual paradigms.  Conceptually, I bucket two main kinds of venture capital funds, birds and turtles.  Both have been around for a long long time, both can point at many examples of gigantic impactful companies they have invested in that shifted the axis of the world.  Subtraction Capital is a bird fund.  I wrote this post to explore some ideas around the impact of both species of fund.  As a founder, which type of fund would you rather have investing in your company?

Birds lay a small number of eggs and dote over all of them.  They keep them warm, help them hatch, and individually feed each and every chick to try to help it grow into the strongest and most successful bird possible.  A high proportion of their young survive, and the species persists.

Turtles, in contrast, dig a nest on the beach, creating an environment where the eggs can be relatively safe and kept warm by the sun.  They then lay dozens of eggs, cover the nest back up, and go back to sea.  When the eggs hatch, the hatchlings dig their way out of the nest and race down the beach to try to make their way into the ocean.  On their own, they must evade predators and currents and try to find food so they can grow strong while avoiding being eaten.  Survival rates are low, but because of the numbers up front, the species lives on.

I do not know the statistics of outcomes at Bird funds vs Turtle funds, would be an interesting one maybe for Anand.  Many people argue that it doesn’t matter how much coaching is provided to a company a fund invests in, their outcome will not change significantly enough to affect the return at the fund where the investment came from.  There is a counter argument that so many really successful companies have “Near Death Experiences” (I would love to assemble a book about these, but later) or find a “Second Wind” that it is difficult to say in early days which company will provide the outsized return.  PayPal almost died in the early days due to fraud losses, but was able to turn around and become a very successful company. grew at a very slow pace for nearly a decade before they hit the big ramp up.  And finally, I have heard Marcus Ogawa say “It is always darkest before dawn” when referring to companies hitting their tight spot.

The brand affect of birds and turtles is also interesting.  The turtle argument might be that so long as the outsized return company’s logo is on the fund’s website, the fund is associated with the big winner, which will attract the entrepreneurs that are going to produce the next wave of outsized returns.  Further, just the act of a top fund investing in a company may lift the company’s probability of success more than anything else the fund might do.  There is likely a virtuous cycle in there.  And I know I have heard the opinion that the really outsized returns will come from first time entrepreneurs, who might be influenced more by broad PR turtle brand effects more than word of mouth from insiders.  Even if that was historically true, will it always be true?  Or as our tech industry evolves and matures will the repeat entrepreneur advantage become greater and create proverse selection there?  I know that plenty of other investors already say that repeat entrepreneurs have an advantage.  I personally have no bias either direction, because my personal sample size is too small to be statistically significant and I don’t care for anecdotes outside of my own context as I fear apples and oranges are being compared.

The bird counter argument on brand might be that by helping to coach every entrepreneur to the best outcome they can have, that entrepreneurs will tell other entrepreneurs which fund was most helpful for their personal return and attract the next wave.  It would be interesting to look statistically at whether bird funds attract and invest in repeat entrepreneurs more than turtles, with the thesis that perhaps repeat entrepreneurs know which funds are birds and prefer them.  But someone else might make the argument that repeat entrepreneurs need help the least and therefore do not care.  That has not been my experience.  I feel that often founders that need help the least value it the most.  Like Tiger Woods selecting a golf coach.  He is destined to be a champion no matter what and by virtue of that he can pick any coach he desires.  You better believe he wants the best.  This is another virtuous cycle.

Similarly, when I think of old-school VC, I think of fund partners spending as much as a day every week or two physically at the company that they invested in.  In order to invest like this, the ratio of companies invested in to partners at the fund needs to be low enough that there is actually some time to spend with the companies.

So as you go out looking for investors, consider wether you want a turtle or a bird, or one of each perhaps?  Hedge your bets if you can?  Subtraction Capital is a Bird, and we very much like to be available to coach the companies we invest in if they desire it.  Useful = happy for us : )

Subtraction = Growth

I saw the angel in the marble and carved until I set him free   - Michelangelo

We are often asked why we named the firm Subtraction Capital vs a more typical name like Thermo-Nuclear Capital, “Blowin’ Stuff Up”.  “I thought you were supposed to ‘add’ capital to startups” say the uninitiated.  Jason and I each worked for several high growth startups from a pretty early stage, and our experience has taught us that the key to growing fast is Subtracting everything that is not helping you grow . . . as quickly and ruthlessly as possible actually.  It was hard for us to learn what to stop doing.  It is scary.  What if you cut the random thing that will make you really go?  Well, despite the stories, random things don’t make you grow.  Steady Subtraction and relentless execution of the right things makes you grow.  You may work in spheres where you will never need to write an essay or research paper, yet you will use knowledge that you acquired in your college in more practical, hands-on ways. So Subtraction = Growth.  The scarcest resource at any startup is bandwidth.  Every ounce of bandwidth available needs to be pointed at making the company grow very fast.  Subtract More!

The Noise

Startup founders are barraged with noise.  The inbox is always full of people that want to help them, usually for a fee, and those people are very good at telling founders why the help they are going to provide is really important and can keep founders from losing their company.  There is also a constant buzz of scenes with famous founders, investors, and press that is happy to welcome and distract new founders.  And it is easy for founders to find 100 things that might make them an overnight success.  And each one in isolation seems so small.  It is death by 1,000 cuts.  But in the real world there are a very few things that must be done well to create an important company.  Achieving product market fit, crafting a valuable and defensible strategy, finding a scalable means of distribution, securing the capital to accomplish all of this, and creating a culture that can sustain insane growth jump to the forefront.  And even this short list need not be accomplished simultaneously.  Warren Buffet once said “The difference between successful people and really successful people is that really successful people say no to almost everything.”  Subtracting for success supports this notion.

Sometimes people who have never experienced a rapidly growing tech startup believe that you build by creating more and more things.  Like making a painting better by putting more and more things on the canvas.  But our combined experience at PayPal, NextCard,, Palantir, and Practice Fusion is that you succeed by nailing a small number of right things.  Just as there is a Power Law to investment returns, there is a Power Law to return on effort within a startup.  So to succeed wildly, we must Subtract constantly everything which is not helping us win.  French author Antoine de Saint-Exupery wrote that “A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.”  And similarly we manage startups and grow companies fastest when we have nothing left to Subtract from success-producing effort.

The Subtraction Difference

So when we started Subtraction Capital, we wanted to build something different than a lot of early stage investing funds.  We are not the only early stage fund to select an irreverent name.  Harrison Metal, lowercase Capital, Cowboy Ventures, and Homebrew jump immediately to mind.  I can’t speak for the other firms, but perhaps this reflects a common dissatisfaction with too much of what we have seen.  So for ourselves, we wanted to wear what we believe is the most powerful and not consciously obvious key to our own careers, Subtraction.

Ironically, the founders that want us to join them on their journey the most are usually the ones that need help the least.  They recognize that the difference between the winner and second place (1st loser as Jason Calacanis taught me) can very well be Subtracting a handful of things which in isolation seem inconsequential, but collectively give their company the focus and bandwidth to win.  These are the founders that have the best chance of success, with or without Subtraction Capital involved.  We are fortunate when they let us join them for the journey.  It is our favorite thing in the whole world to do.

Subtraction Pre-Dates the Internet Era

I learned to Subtract as the best success strategy when I was at Boeing, though we did not call it that.  We focused everything on quality, and knew that if we designed and produced the highest quality aircraft, market share and profits would also be optimized indirectly.  A focus on quality, like good design, also leads to simpler solutions that have less rework, lower maintenance cost, and are just cleaner.  Elevating the focus on quality gave us the freedom to focus on a very small number of things and do them exceptionally well.  It is perhaps counter-intuitive to many people who associate quality with more bells and whistles.  Similarly, practices like Agile development, which came out of manufacturing process, use techniques like sprints to Subtract all of the backlog distractions that might keep you from releasing something critical as soon as possible.  Kan-ban Subtracts the weight of the distracting backlog away and lets people enjoy the focus of the next thing on the list.  Subtraction is institutional already at startups.

Subtraction Is Philosophical Too

There are deeper meanings to Subtraction as well.  One of them relates to all the noise that is keeping you from accessing your subconscious processor.  Subtract that noise and access the big cpu up there!  It is not just a little bit more powerful than your conscious mind, it is a lot more powerful.  Do you ever think an athlete could win their title without Subtracting all other concerns: their paycheck, their tv ratings, their sponsors, everything really except the opponent in front of them and their own actions.  We readily accept that professional athletes train until they can compete without “thinking” so that their subconscious is actually doing the thinking.  But we rarely try to shoot this very big gun as tech startup athletes.  We should.

If you have ever really fought in startup world, you know that Subtraction is the key to growing really really fast and winning.  This resonates with successful entrepreneurs, because they know how they have won in the past.  If you want Subtracting practitioners to join you on your journey, we would be honored to consider the prospect together.  The Subtracting process, and the hyper-growth that results, is the reason we wake up in the morning.  To do less, better.

Hypothesis Testing Equation aka “Is This Really Better?”

I am often shown results from some kind of trial or test performed in a market and asked if I think the results should be acted upon or not.  And I always fall back to the same method I used to evaluate test results throughout my career whether looking at sometimes noisy aircraft design test data at Boeing, evaluating performance of a banner ad, or testing different places or times to ask customers to sign up for a newsletter.  I have used the hypothesis testing equation to quantify some level of certainty that the winner is in fact a winner vs the result of noisy data and a skewed sample.  I have found this to be such a reliable tool that I decided to make a google spreadsheet showing and applying the form of the equations that I used so anyone can see how I applied this practice. Feel free to make a copy of it for your own use if you want to, but please understand what it is and use it at your own risk.

TLDR: Go to this spreadsheet, copy it, change the values in the blue boxes and look at the resulting confidence intervals to see if a hypothetical test is done or not : )

I will tell you more about how I used it in this post, and I also plan to demonstrate its use at the Agile Marketing Meetup that I cohost with Austin Walne and Nick Muldoon on Wednesday, January 14, 2014, so you should be able to find the video of my demonstration on as well. Jascha Kaykas-Wolff is giving a talk about War Stories on Real World Agile Marketing right after the demo that should be excellent.

A/B Testing

Conceptually A/B testing is a simple concept, you show some group of your site visitors a red button (the champion) and another group sees the blue button (the challenger) and you see which one gets clicked more or has the higher conversion and/or takes the upsell more at checkout.  You can change images, shopping cart flows, headline text, body text, virtually anything about the design.  You do this instead of running the red for three days followed by the blue for three days because you can see that your baseline red button conversion rate is noisy and bouncing around all the time for who knows what reasons: marketing programs, press releases, day of week, end of month, dow up, dow down, etc.  It is not uncommon for the noise in the data to be so great that you could get a 5-20% better solution and not even be able to see it among the noise.  So you randomly present each version of the design to some portion of your traffic so that at least you can take the day to day variation in traffic types out of the equation and compare performance.

After running this test for a week, with the challenger blue button against 20% of your traffic, your conversion for the blue button is 7.16% vs 7.53% for the champion red button.  5% worse, so it is time to give up the test, right?  Or is the sample size of the challenger conversions so small that if a few more conversions went one way or anther, it could catch up to even?  The hypothesis testing equation gives you an objective framework to evaluate this.  It also provides a methodology that you can use consistently across a team as well as over time to help avoid fooling yourself because you are not consistently evaluating your product.  In my experience, this is even more important than the accuracy of the methodology at chaotic fast growing tech companies.

The first times I saw the Hypothesis Testing Equations Applied

In about 1994, I was in the Advanced Technology and Design Aerodynamics group at Boeing, and two PhD Aerodynamicists, Byram Bays-Muchmore and Frank Payne, were showing me some noisy aircraft design data that had 90% confidence interval bands drawn around the data based on measurement.  And although one curve was clearly higher than the other curve, the bands for the two designs overlapped more than not, so I couldn’t be very certain that one design was actually better than another.  And the hypothesis testing equation suggested the likelihood that if all noise and granularity was removed from the data the winner would actually still be the winner.  This was probably the first time that I had seen practical application of uncertainty around real data. We didn’t do this in college!

Fast forward 5 years, and I had moved over to the new and exciting world of the internet as a product guy.  Imagine my surprise when I saw the Marketing team at NextCard using an Excel spreadsheet that Jared Young had built to apply the hypothesis testing equation and calculate the likelihood that one banner was actually performing better than another or not.

They were using the binomial form of the equation to look at click through rate and account application rates for different banners.  Dave Schwartz, the brilliant quantitative marketer who hired me at and has founded his own marketing analytics company Cold Creek Technology, even made a mantra out of creating three segments for a test.  He called them A, A’ (A prime), and B.  A and A’ had the same designs and only differed by their group label.  So essentially, they were a randomization check. The hypothesis testing equation should always tell you that A and A’ were statistically identical if not numerically, and if they didn’t there may be a bias to the way you assigned your population. This is a step up in the level of sophistication though.

So when we built the A/B testing engine in the product, it was very natural at NextCard for us to use the hypothesis testing equation to evaluate results.  Since we were trying to test participation and results of customers for the life of the customer, we could even apply the continuous versions of the equation set to look at profitability of two different groups over time, etc.

I continued to use the hypothesis testing equation at Advanta, Washington Mutual, and as we built our A/B testing engines and evaluated results.  Later Google Website Optimizer used it as well, and so did Visual Website Optimizer and Optimizely.  I am sure will use it to evaluate messaging data as well.  It is a structured and convenient form and less subjective than just looking at means from different sample sizes or variances.  Thankfully others won’t have to build A/B testing engines any more as they are available off the shelf and they apply best practices like this typically, but interestingly, I don’t think the ad servers have built a really convenient hypothesis testing evaluation into their tools as of the time of this writing (hint!).  And I don’t think people have done a good job applying these tools for scenario analysis for test planning or segmentation analysis yet either.  I will elaborate more later.

My Derivation



My derivation was motivated to get a form that was easy to code in SQL, as that was the tool that I was most often using for analytics.  So this form was derived once at NextCard back when and I shared the result of that derivation in my Agile Marketing Meetup talk on page 13.  It was re-derived to check accuracy at many years later after I found out that in addition to being an incredible analytical marketer, Jiyoon Chung was also an amazing statistical whiz. She started it off from first principles again and she also checked my derivation at the end to make sure I didn’t mess it up.  But that and a couple conversations with other stats whiz’s that I have worked with is the extent of my diligence.  There are other forms of the equations as well as other ways to do this, but I have always used these forms because they are easy in both SQL and Excel, and now in Google Spreadsheets as well.

The Equations and the Spreadsheet

Look at my google spreadsheet as I am going to use it as a concrete example for the remainder.  The screenshot of it above can also serve as a reference.  There are two forms of the equation for two types of data, and two tabs in my worksheet accordingly.  One for what is called a binomial data distribution, which means there are only two possible outcomes.  So a visitor clicked or didn’t, converted or didn’t, signed up or didn’t, yes/no answers, thus binomial.  Mathematically, if you know the mean % conversions of a binomial distribution, you know what the variance looks like as you know exactly how many converted and how many didn’t.  So it is a little bit simpler to look at.

The other tab, for a continuous distribution, is typically used for things like shopping cart size, 12 month cumulative revenue, or number of friends referred.  You need a variance number for this one, but that is appropriate for these types of measures.

What does it mean, a non-statistician’s interpretation of what smart people have explained to me

With apologies to real statistics people that may bury me in the comments, here is a lay person interpretation of how to use this.  If you run a test against a discrete population of limited sample size, you are getting an estimate of what conversion would be like if you ran it against the whole population.  This is what you want to do, so you can find the best converting approach to run against the entire market before you spend the big bucks to get in front of the entire market.  But depending on how large a sample you run, if you ran the same test against another small sample you might get a little different number for a variety of reasons.  So in the spreadsheet, put in a sample size and conversion number for sample A and B, and the confidence interval will tell you the % likelihood that the current winner is the real winner against the total population.  It doesn’t say how much it would win by, just that it would be the same winner.  So if it is 50%, you have two numbers that are statistically the same.  A quarter lands on heads 50% of the time after all.  At 90% confidence, there is still a 10% chance the other is the “real winner”.  This is a simplification, but a working one.  The actual description as I understand has to do with the distribution of relative sample results.  Talk to a statistics person rather than an operator if you want to go there : )

In my example spreadsheet I ran the test in front of more than 25k visitors and got more than 1,800 conversions.   And my red page converted 5% higher than the blue page.  Sounds pretty solid right?  But with only 70% confidence red is really better, it is not too much better than a coin toss as to which page would really win against the whole population.  Sometimes, I might keep the test running.  Sometimes, I might call this a draw and move on to try to find something that wins by more than 5% performance.  But regardless, I know something about the test results that does not necessarily look intuitive at first glance.  And of course, just knowing red vs blue did not move the dial enough to get to a 10% performance delta, which would have yielded a 95% confidence interval, is useful data regarding how a sample of my market interacts with my product experience.  95% is what I often looked for to declare a winner, though I might move forward with as low as 85%, but knowingly so.

Practical Application

My goal of A/B testing is not only to find the winner and apply it, but more importantly to learn about my product and market and apply that learning to win.  If I was simply trying to get the highest number, a machine could optimize things for me potentially, but no market is large enough to test all the variations possible and find a winner, so I am trying to learn and inform the variations I would like to prioritize.  As far as numerical confidence, I did like to get 95% if possible, and I started to favor a result around 85% and might even consider declaring a winner and a discriminant at that level.


One way I like to learn about markets is to segment results of A/B testing in analysis.  In many cases, I might have a tie for the red and the blue page at the aggregate level, but find that one segment performed significantly better and one significantly worse, resulting in a draw at the aggregate level.  But that means that I had to apply these methods to each segmentation dimension to ensure I had significant results within the segments, otherwise it may just be noise at that level too.  Anyway, if I found that one segment performed better with version A and the other with B and the differences in performance is statistically meaningful, you might first think that I would run two versions of my product experience depending on which segment I was running against.  But this is not a very scalable approach and I almost never did it.  If I had, I would have created more and more permutations of product experience and it would get really difficult to manage and maintain.  Better to understand why each segment is performing better or worse, and then take that learning to run a new test with version C and try to get the best performance out of both segments.

Test Weights

The prior example brings up an important use case of the worksheet.  I used it to figure out how much audience I want to throw at a test.  There are many considerations when sizing a test.  If it is a change I am definitely making for brand or product reasons, I may just want to measure the performance impact relative to the old baseline.  So I might run it 50/50, 50% of your audience getting version A and 50% getting version B, so that you converge on a number as fast as possible.  Often places I worked converted very differently on weekdays vs weekends, so I wanted to run tests for a week if possible and have statistical significance if I segmented by weekday/weekend or even day of week.  Sometimes I was testing something I thought would hurt performance but where I would learn something valuable enough to make up for it in the near term.  Regardless of which scenario I wanted to try out, I could put estimates of the sample sizes I thought I would see into the spreadsheet and figure out what the smallest change that I could get 95% confidence on would be.  Or I could look at the smallest measurable difference I would get out of throwing a full blown 50/50 test in.  Or if it was realistic for me to measure a segmentation result.  I could have re-derived equations to calculate these things precisely or used goal seek to iterate the answer, but in practice I just threw numbers in to get a feel for it.

There are a bunch of other handy and fun uses.  Part of the beauty of a tool like this is that it is pretty flexible.  And even the Optimizely’s of the world don’t have the sort of scenario exploration or segmentation analysis built into their product yet, so it can be handy to keep around.  The other thing I did was keep a copy of this spreadsheet with the results of every test run in a binder along with cohort graphs when applicable and hard copies of the design for A and B.  I found this historical reference was really useful to share with teammates and reference.  I know, a paper archive seems very dinosaur now, but maybe that is what I learned back at Boeing with the likes of test documentation and flight test manuals.  Regardless, I found a spreadsheet like this to be a very handy tool, and I hope that it might benefit a bunch of people to see under the hood.  Please hit the comments up with your stories of methodologies and use cases, and please forgive my statistical glossing over as well as incomplete descriptions of everything.  This could easily be a one hour talk rather than a two page blog post.  And finally please credit Subtraction Capital if this helps you out.  We are always grateful for good credit among entrepreneurs.

1st Person-a

The use of personas for user experience design is widely regarded as a best practice and widely adopted in tech world.  Designing for an archetype customer yields more usable and effective interfaces than other methods.  In my memory, Alan Cooper labeled it and defined it as most of us use it in his excellent and far ahead of its time 1999 book, The Inmates are Running the Asylum.  I would propose that a very special persona, the 1st Persona, should be a part of the personas used in design.  This is the persona of your company, your interface, your brand, your perfect sales representative and/or account manager that you want interacting with your customers.  Your customers are going to interact with every interface, app, email, and system message as if it were a person anyway.  And whether they try to or not, your designers and developers are going to build personality into your interface either deliberately or as a fall out property.  If you accept and understand that and give customers someone that is consistent with your good brand, it will take you further.  Your 1st Persona.  You can start to think of customer interaction with your systems as interaction between two personas, and you will more consistently design the outcome you are trying to.

We have known for a long time that customers treat interfaces like people.  In the early days of the web when bandwidth made interactions painfully slow, customers would faithfully wait for a response for about 10 seconds before they started moving to other tabs in the browser and multi-tasking, giving you their full attention if you were demonstrating the same responsiveness to them.  Customers get frustrated by interfaces, laugh at their jokes, and even attribute virtue or lack thereof to honest open source projects and evil loan applications.  We use the same Influence methods in Marketing and design of interfaces that we train sales people to use, because people will attribute emotional intent to your interface.  So we tell customers the interface likes them, it is glad to see them again, it politely thanks them for submitting information or buying . . . as if.  We tell them the interface or computer is doing work for them and they feel obligated to repay the gift just as they would with a car salesman who tells you they are going to fight their manager and get you a better price on your car.  Universally, we steal human interactions and build them into our interfaces to facilitate conversion, engagement, and usage already.

So why wouldn’t we define the personality of that person so that everyone working at our company can make the most consistent effective interaction possible?  Can you imagine walking into a department store and getting a completely different style and approach every time you speak with a different department?  You would feel like you were working with commissioned people with no values other than taking your money.  Are you more likely to buy there or at a store with values consistent with what you want that you can trust you will find universally?  Think Nordstrom maybe.

It is ok that it is not possible to build all aspects of the complete persona we can describe.  The important part is being consistent where we do build.  And just as people are understanding of the limitations of different people that they are interacting with, people will be understanding of your interfaces and their limitations.  And likewise they will appreciate that your interface has some superpowers like incredibly fast, capable, and accurate calculations, that a real person could not do.  Take advantage of this when you define your 1st Persona.  If they would benefit from being amazing at doing the math of compound interest right in their heads instantly, then make them a math savant.  If your interface would benefit from having perfect memory and recall forever, then consider emphasizing that aspect of its personality.

So what should your 1st Persona be?  Your ultimate brand ambassador.  Think of a combination perhaps of your perfect sales person and account manager if you can make that one consistent person.  Wether you have people selling or not doesn’t matter.  You would want them to be effective and persuasive, but you don’t want them to come off as a snake oil sales person standing on a soap box in the street, and you probably don’t want a classic used car sales person archetype either.  Look to your brand, look to your perfect account manager once you have made a sale.  Again, wether you have people as account managers or not, IF you had a perfect account manager, what would this person be like?  How would they interact with your customers to support your mission, long term strategy, and success?

It’s time to accept that we have all been designing robots with personalities for a long time, most of them with very bad personalities and frequently inconsistent with each other as well as your brand and the people that work with customers at your company.  It is time to get deliberate about designing better ones with an intentional 1st Persona.

Saas at Last for Messaging

For 15 years I ran technology product, design, and marketing teams, and every time I changed companies I had to build two things: a messaging platform and an A-B testing engine. Regardless of the vertical my company was attacking, we always needed messaging to get people to our site/app and either sign up, purchase, or re-engage, and we always needed a way to test and optimize the experience once we got people to the site. Optimizely took care of the A-B engine, and now with the launch of Subtraction Capital portfolio company Outbound, a simple tool that sends email, mobile push or SMS based messages, the messaging platform is taken care of as well.

First, Optimizely gave us A-B testing out of the box…

The first time I saw Optimizely was memorable for me. They were at Y Combinator and I was at, where I had just built an A-B testing engine for the fourth time. OK, the development team actually built it, not me (Manicka Babu and Gayathri Nayack if I remember that correctly), but that meant that some of our strongest developers had to wait to build some other Coupony thing until after they got the A-B engine done. Anyway, when I saw Optimizely pitch at a Y Combinator marketing tech event, I was so excited at the prospect of not having to ever do it again. I was hopeful that Dan Siroker and Pete Koomen would navigate the startup waters to success. First off, I would never have to build it again, and I could instead focus my company’s development resources on our core differentiating technologies. Much more strategic, right? Second, and equally important, there were always features that I wanted in my A-B testing engine that just did not make sense for me to build as a single company deriving benefit from the development work. The ROI just was not there for one company. But as an A-B testing engine for thousands of companies, Optimizely could cost effectively go very deep down the backlog. Their costs to develop on a platform instead of hacked onto a single code base would not be that much higher, but the return is effectively a small portion of the return from every company that benefits (assuming they can bill some small percentage of the value they create). Another way of looking at it is that each of the thousands of customers only has to pay a tiny fraction of the development cost, but the return does not change much, so the ROI is much better. Third, small companies that could not afford to build their own A-B testing engines yet could now access this game changing enabling technology because it was very simple to get started and give it a try.

Now, Outbound gives us event-based messages out of the box

OK, so from then forward, every company I worked at used Optimizely, and we could A-B test our interface and experience. Piece of cake! Which I suppose left us more time to build . . . a messaging platform. I remember when Facebook was early and not using email for re-engagement. I think the tech nerdiness of their core product team was resistant to using something as uncool and antiquated as email in a day when sms and smartphones were taking over. But eventually, the email summaries came around, and they were smart, customized, and conditional, only sending what was needed when it was needed. I saw this same progression at many of the great companies of Silicon Valley and realized that email was the dirty little re-engagement secret we were all too embarrassed to talk about in the open. Maybe it’s pathetic that an old, asynchronous, sparse interface channel created to copy snail mail was still ridiculously relevant. Like the famous Peter Thiel quote, “We wanted flying cars, instead we got 140 characters.” Maybe we just didn’t want to align ourselves anywhere near the spammers that annoy us with their abuse of this channel. But regardless, nearly every tech company I saw under the covers of found email their most successful and high ROI channel for customer acquisition as well as re-engagement, but no one was saying it out loud. Fast forward 5 or so years, and now we have email, sms, iOS system messages, and Android system messages all available as mainstream ways to reach the hook out and pull someone into an experience. But despite its ubiquity and effectiveness, we still have to go out and build our content creators, data warehouse, segmentation strategies, timing strategies, frequency strategies, chron jobs, etc to figure out what the right message is and when to deliver it. So when I first spoke to Josh Weissburg of a couple years ago from a Mark Harnett introduction, I felt that same light bulb as when I first saw Optimizely. Finally, someone was building the tool so that I did not have to build the messaging engine ever again. And not a moment too soon, with the proliferation of delivery mechanisms and interactive, it was getting harder and harder to build it yourself. In the early days, we had to worry about what people did on our web site, and what email message to send. And that is still a strong use case, but now most tech startups build interactive experiences on web sites as well as iOS and Android apps, and use all four of the channels I listed above as messaging channels. After a few interactions with Josh and co-founder Dhruv Mehta, I signed on as an advisor. In the year following, I joined Jason Portnoy at Subtraction Capital and have had the opportunity to become an investor at Outbound as well.

How Outbound works

Outbound lets a non-developer create and send messages. Often the product marketer is called upon to do this in larger companies, but in smaller companies that don’t have product marketing yet it could be the growth hacker, co-founder, product manager, designer, content marketer, copywriter or anyone who helps with distribution. No more developer required to create a new message or even the segmentation rules that trigger it. No more need to have 4 separate teams to develop and execute messaging on 4 separate channels. The point is that no matter who you are, you can use it. Setting up a campaign looks like this:

A favorite initial use case is the engagement funnel. Send this message via this channel when someone does A but does not do B, so for instance, when someone puts something in a shopping cart, but does not check out within 2 hours, send them a 10% off coupon to see if you can get them back to buy. Tracking can be wired once and the same wiring that makes tracking work is used for segmentation. Segmentationhappens on the fly - no need to pass in the segments, drag and drop them any time. Tracking and segmentation can happen across different interactive media types but be used together seamlessly. So in the above example, you could send the push just to users in San Francisco with premium accounts. All the hard things are taken care of leaving the content creator to focus on getting people from point A to point B in your product:


What’s so special about this tool?

To continue the Optimizely parallel from above, here’s a review of the three benefits from finally getting this important function into the Saas realm. First, I never have to build it again. Check. Well, I am not an operator anymore, so this is leveraged way beyond that now. This will save dozens of companies having to build a sophisticated messaging platform as I can just refer Subtraction Capital companies in. Second, Outbound can build the deep functions that would never make sense for a single company. Well, definitely for content and rule creation, Outbound already has a much simpler interface that allows marketers to create and send triggered messages than any company I have ever seen. For example, how many times does a non-technical person want to customize a message but doesn’t want to bug engineering to figure out how? In Outbound, they don’t need to hunt for that variable and figure out a templating language for variables-just drag the data in from a menu right next to the editor:



And previously, as companies grew, they would often pick a main channel and take it on first. Later, they might add a second channel, for instance sending system messages to mobile app users or email messages to web site users. The marketers would of course want to test things like sending system message vs web site message but these were usually separate systems and the coordination alone would be too big a headache, much less testing strategies like following a system message with an email 3 hours later. So Outbound already covers some of the most important of these cases with their beta product, and will surely get much deeper by the time they are at official non-beta “launch product.” A personal favorite pet of mine will of course be the A-B testing access : ) Third, Outbound makes the benefits of solid messaging programs available to a broader group of tech startups instead of just the big dogs. Before Outbound, every company wanted the benefits of a really intelligent, well thought out, well created messaging program, but it was just too darned expensive. The answer it seemed was either to blast away, sending less relevance and adding to the spammy feel of our inbox that make even shameless marketers wince about using this channel, or stay silent like Facebook did until we are big enough to build a solid messaging platform. Outbound checks this box as well. I am excited to see go out into the world now. And I am anxious to see what all my distribution friends (growth hackers, marketers, product managers, designers, copy writers, etc) come up with as use cases to help their companies grow. And most of all I am glad that there is one more tool I never have to build again. Software is eating the world cheaper and cheaper, and faster and faster, and it is because of tools like this. At the end of the day, helping push that rock is the most exciting part of all.