Overview

Anytime you have products that come in variations - for example, in different colors - knowing how to structure your data to best represent your inventory without duplicates can become tricky. Good examples are t-shirts of different colors or smartphones with different storage capacity (and consequently different prices).

Let’s take the example of an e-commerce website selling t-shirts and sweatshirts in a variety of models and colors. The simplest solution would be to have one record per model and only have a list of possible color variants. The issue with this approach is that whenever someone searches for “red t-shirt”, the engine returns all the t-shirt models that have at least one variant in red. However, the thumbnail would not necessarily be red, which would be confusing.

Instead, we want to make sure that whenever someone types “red t-shirt”, the page only displays products with a red thumbnail. Also, we want to make sure we only get single item per model, and find a clever way to display all variants.

Dataset Example

In our inventory, we have two t-shirt models (A and B) and two models of sweatshirts (C and D). Each model comes in several colors.

Distinct

In our dataset, we can represent them by creating one record for each color variant of each item. Each record specifies the type, the model, the color and the associated thumbnail. Here’s what our records look like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[
  {
    "type": "t-shirt",
    "model": "B",
    "color": "blue",
    "thumbnail_url": "tshirt-B-blue.png"
  },
  {
    "type": "sweatshirt",
    "model": "C",
    "color": "red",
    "thumbnail_url": "sweatshirt-C-red.png"
  },
  ...
]

Going a step further, we could also add all the possible color variations for each record. This way, we could display all the variants for a single product in our front end (e.g., color swatches under the thumbnail), allowing the end user to discover them.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[
  {
    "type": "t-shirt",
    "model": "B",
    "color": "blue",
    "thumbnail_url": "tshirt-B-blue.png",
    "color_variants": ["orange", "teal", "yellow", "red", "green"]
  },
  {
    "type": "t-shirt",
    "model": "B",
    "color": "orange",
    "thumbnail_url": "tshirt-B-orange.png",
    "color_variants": ["blue", "teal", "yellow", "red", "green"]
  },
  ...
]

With this approach, every record represents a single variation, which ensures we always display consistent data. Having one record per variation also lets you add granular custom ranking attributes, like number_of_sales. Besides, you can leverage Algolia’s distinct feature to de-duplicate models. This way, when someone searches for “t-shirt”, they only get one of each model.

Using the API

At indexing time

Before de-duplicating items, we want to restrict what attributes are searchable. We don’t want to search into thumbnail_url, which may be irrelevant and add noise, nor into color_variants, because it could lead to false positives. Therefore, we can set model, type and color as searchableAttributes.

1
2
3
4
5
6
7
$index->setSettings([
  'searchableAttributes' => [
    "model",
    "type",
    "color"
  ]
]);

To use distinct you first need to set model as attributeForDistinct during indexing time. Only then can you set distinct to true to de-duplicate your results. Note that setting distinct at indexing time is optional. If you want to, you can set it at query time instead.

1
2
3
4
$index->setSettings([
  'attributeForDistinct' => 'model',
  'distinct' => true
]);

At query time

Once attributeForDistinct is set, you can enable distinct by setting it to true. Note that you can set distinct to true or 1 interchangeably.

1
2
3
$results = $index->search('query', [
  'distinct' => true
]);

Using the Dashboard

You can also set your attribute for distinct and enable distinct in your Algolia dashboard.

  1. Go to your dashboard and select your index.
  2. Click on the Configuration tab.
  3. In the Searchable Attributes section, click the “Add a searchable attribute” button.
  4. Select the model, type and color attributes in the dropdown one after another.
  5. Click the Deduplication and Grouping tab, which you can find under Search behavior.
  6. Set the Distinct dropdown to true.
  7. Set the Attribute for Distinct dropdown to model.
  8. Don’t forget to save your changes!

When distinct is set to true, we get one color for each model. To control which one, you can set a new attribute with business metrics (e.g., number_of_sales) and set it up for custom ranking.

Distinct 2

Additionally, we can display all available colors for each item thanks to the color_variants attribute. This way, the end user can access all possible variants from the search results without the page being crowded with too many items.

Did you find this page helpful?