How to Get Started with Look-Alike Modelling

Over the past ten or so years, the concept of “look-alike modeling” has pervaded ad tech to the point of becoming cliché. What’s more, it’s invoked when describing a plethora of audience modeling methodologies, confusing advertisers’ understanding of what their look-alike model actually does. In fact, in a recent State of the Industry report, eXelate published the finding that 25% of advertisers are unable to produce an accurate definition of look-alike modeling.

Despite the triteness of their name, look-alike modeling success stories abound, with recent case studies boasting anywhere from 30% to 1000% lift in conversion rates. Claims of success like these certainly justify further exploration into what these models are and how they’re best implemented.

So, what is it?

Put simply, look-alike modeling is the process of identifying new potential customers on the basis that they behave like, or in some other way resemble, current customers. The current set of customers, commonly referred to as the “seed pool”, is typically sourced from online sources, via pixels, or offline sources, via onboarding CRM data.

With a basic definition established, I’ll offer some guidelines to consider when selecting a solution for look-alike model implementation.

Seek out solutions with a high degree of transparency

Look-alike models are predictive models (typically classification or clustering models) that share the common goal of assessing the similarity of each person to a target consumer profile. While you don’t have to be an expert in predictive modeling to make use of a look-alike model, you should have a basic understanding of the model framework being utilised. Any potential look-alike modeling partner that can’t describe the predictive modeling method underlying their look-alike solution is probably best avoided. Aside from model transparency, you should get a feel for the data that’s feeding the model. Regardless of the model framework chosen, the output is worthless if the data it ingests is inaccurate, irrelevant or otherwise poorly suited to the predictive modelling task. As the adage says, garbage in, garbage out.

Minimise explicit targeting (or exclusion) of audiences, especially during the model training period

The majority of advertisers with whom I’ve worked had detailed knowledge of who their customer is and what their digital profile looks like. While that expertise may seem like a boon from a digital marketing standpoint, it often winds up restricting look-alike model performance. The problem is that explicitly targeting from the outset of a campaign potentially limits the machine learning algorithm to a subset of the data input universe. Handcuffing the model in this fashion introduces bias and limits the predictive power that can ultimately be achieved. Admittedly, there is some merit to excluding audiences that are known not to perform at the outset, rather than wasting impressions (and money) waiting for the model draw the same conclusion. However, make sure there is ample evidence behind this decision, otherwise it’s best to let the model “do its thing”.

Leverage the full data context when determining the extent to which a new user looks like your current customer

This recommendation is built off my initial recommendation of knowing the model methodology. Many of the less sophisticated look-alike modeling solutions in the marketplace, especially those built into self-serve DSPs, inherently prioritise scale over accuracy. These models overstate the extent to which a given user looks like the target user by focusing only on what the two users have in common and ignoring the presence of negative predictors.

As an example, I recently implemented a look-alike model on behalf of an apartment rental site looking to drive online rental applications. The model learned very quickly that any user with a profile attribute for home ownership was very unlikely to submit a rental application, even in cases where the user displayed many attributes in common with the target consumer. The attribute for home ownership therefore received larger weighting relative to other attributes, and impressions to users with a home ownership attribute declined. However, had the advertiser leveraged the approach used by many self-serve DSPs that merely seeks attribute overlap between profiles, the advertiser would have wasted significant impressions against the poor performing homeowner audience.

Let’s summarise

If you don’t know exactly what a look-alike model is, don’t sweat it, you have plenty of company. Though the concept and supporting technologies have been around for a while, look-alike modeling is often not fully understood. For those interested in exploring how look-alike models might boost performance of online campaigns, the best practices outlined above provide a solid foundation for evaluating the various solutions offered in the marketplace. Happy prospecting!