How It Works — Boardgame Sommalier

The recommender fetches your ratings from BoardGameGeek and uses them to find games you haven't played that you're likely to enjoy. Three recommendation modes are available, each using a different strategy to measure how well a candidate game matches your taste. All modes share a common data pipeline described below.

The Common Pipeline

Steps that run in every mode before scoring begins

Fetch your ratings. Your rated games are retrieved from the BGG XML API (/xmlapi2/collection), including your personal score, the BGG community average, and your logged play count for each game.

Build the candidate pool. Games you haven't rated are assembled as candidates. In Content mode this uses your top 5 rated games as seeds; in Collaborative and Hybrid modes it uses all liked games. BGG's "similar games" list is fetched for each seed via the internal geekitem/recs endpoint, which is itself computed from co-rating patterns across millions of users.

Add the hot list. In Content and Hybrid modes, BGG's current hot games are added to the candidate pool to ensure broadly popular titles aren't missed.

Exclude known games. Any game you've already rated is removed. Your owned and wishlisted games (fetched from BGG separately) are also excluded — there's no point recommending something you already have.

Fetch candidate metadata. Full details are fetched for each candidate: mechanics, categories, BGG families, complexity weight, Bayes-adjusted rating, and player count. Responses are cached to disk with a 30-day TTL, so only genuinely new games need a network call.

Score and rank. Every candidate is scored using the selected mode's algorithm. The top N are returned.

Content-Based Filtering Content mode

Recommends games with similar features to games you've rated highly

Feature representation

Each game is described by a feature vector combining three groups of binary attributes plus one continuous value:

Mechanics (~200 distinct tags on BGG) — e.g. Worker Placement, Deck Building, Area Control
Categories (~80 tags) — e.g. Fantasy, Wargame, Economic
BGG Families (thousands of tags) — highly specific groupings like "Game: Pandemic Series" or "Mechanism: Legacy Games" that cross-cut mechanics and categories
Complexity weight — BGG's community-rated difficulty, normalised to a 0–1 scale from the 1–5 original

IDF weighting

Rather than treating all features equally, each is weighted by its Inverse Document Frequency across the entire game corpus:

idf(feature) = log( N / count_of_games_with_feature )

"Dice Rolling" appears in roughly 60% of games and gets a low IDF weight. "Crayon Rail System" appears in under 3% and gets a high weight. This means the similarity signal responds sharply to rare, distinctive features rather than being drowned out by ubiquitous ones.

Building the taste profile

Your taste profile is a weighted average of the feature vectors of all games you've rated at or above the minimum threshold. Three factors determine each game's contribution weight:

Rating weight: w = rating − (min_rating − 1) — a 10/10 has four times the influence of a 7/10 at the default threshold.
Play count boost: × (1 + 0.5 × log(plays + 1)) — a game you've logged 50 plays of contributes ~3× more than one played once at the same rating, because repeat plays signal genuine love beyond the initial impression.
Negative signal: Games rated below the threshold build a separate negative profile, normalised the same way. Your final taste vector subtracts a dampened version:

taste_vector = positive_profile − 0.3 × negative_profile

This means candidates sharing features with games you disliked are actively penalised, not merely ignored.

Cosine similarity

For each candidate, the angle between its feature vector and your taste vector is measured using cosine similarity:

similarity = dot(candidate, taste) / (|candidate| × |taste|)

A value of 1.0 means perfect alignment; 0 means no overlap; negative values (possible because of the negative profile) mean the candidate actively conflicts with your taste. Cosine similarity is scale-invariant — a game with many mechanics isn't automatically more similar just because its vector is longer.

Final score

score = 0.65 × similarity + 0.35 × (bayes_rating / 9)

BGG's Bayes-adjusted community rating (which shrinks scores toward the global mean to penalise low-vote-count games) provides a quality floor. Without it, obscure games with perfect niche fits would dominate over well-loved broadly-appealing titles.

Collaborative Filtering Collaborative mode

Recommends games that players with similar taste have rated highly

The data source

True collaborative filtering requires a matrix of user × item ratings. BGG has this internally — tens of millions of ratings from over a million users — but doesn't expose it via its public API. Instead, BGG publishes pre-computed "similar games" lists for every title on the site via the geekitem/recs endpoint.

These lists are collaborative filtering: BGG computes them from co-rating patterns — if many users who gave Game A a high rating also gave Game B a high rating, the two games appear in each other's similar lists. We use this signal directly rather than reconstructing the ratings matrix ourselves.

Co-occurrence scoring

In Collaborative mode, similar-game lists are fetched for every game you've liked (not just the top 5 seeds used in Content mode). Each candidate is then scored by the weighted fraction of your liked games that co-occur with it:

cf_score = Σ( weight_i for each liked game i where candidate ∈ similar_list(i) )
/ Σ( weight_i for all liked games )

Weights are the same rating × play-count values used in Content mode. A score of 1.0 would mean every liked game you have co-occurs with this candidate; in practice 0.05–0.20 represents a strong signal.

What it finds differently

Content-based filtering can miss games that look superficially different from your taste profile — different setting, different player count, unusual mechanic mix — but consistently delight the same community of players. Collaborative filtering has no concept of game features at all; it finds latent patterns in human preference that don't reduce to any single attribute.

Conversely, if you have a small collection with few liked games, the co-occurrence signal is weak. Collaborative mode works best when you have 20+ games rated above the threshold.

Hybrid Hybrid mode

Blends content and collaborative signals before applying the quality weighting

Hybrid mode combines both approaches at the similarity step. The candidate pool is expanded to include co-occurrence candidates from all liked games plus the hot games list, giving the broadest possible set of options. Scoring uses an equal blend:

similarity = 0.5 × cosine_similarity + 0.5 × cf_score
score = 0.65 × similarity + 0.35 × (bayes_rating / 9)

The content signal keeps results grounded in the features you demonstrably enjoy. The CF signal adds the community dimension — games your taste cohort loves regardless of how they're described. Together they tend to produce more varied top-10 lists than either mode alone.

When to Use Each Mode

Practical guidance based on your collection size and goals

Mode	Best when…	Weaker when…
Content	You have a small collection, or you want tight thematic matches — games that feel obviously related to your favourites	Your taste is broad and doesn't reduce to a consistent mechanic/category profile
Collaborative	You have 20+ liked games and want to discover titles that might surprise you thematically but fit your player community	Your collection is small — the co-occurrence signal needs enough liked games to be reliable
Hybrid	You want varied suggestions and have a reasonably sized collection; good general-purpose default	You want very tight thematic matches (use Content) or maximum surprise (use Collaborative)