Integrating Sitecore XM with Sitecore Search using the Ingestion API

,

In my last blog post I gave a quick introduction to Sitecore Search and showed how we can set up a Web Crawler to add content into our Sitecore Search index. In this post I will show how we can use the Ingestion API to actively push content into Sitecore Search and integrate this with Sitecore XM publishing as an alternative to web crawling (e.g. for pages behind login).

The Ingestion API offers endpoints where we can push content items into the Sitecore Search index. In this post we will be using the POST and PATCH endpoints to create index documents (documents are simply the technical term for the representation of a content item in the index), but the Ingestion API also support PUT and DELETE.

The endpoint is simple to use: We authorize using an API key and then we push our document using the following schema:

{
    "document": 
    {
        "id": "110d559f-dea5-42ea-9c1c-8a5df7e70ef9",
        "fields": {
            "name": "Home",
            "description": "This is a home page",
            "type": "sc103xm",
            "image_url": "https://www.example.com/image.jpg",
            "url": "https://www.example.com"
        }
    }
}

The values for the fields above are examples, however it is worth noticing that id is simply a string we can choose to identity our document. However, before pushing documents to Sitecore Search, we need to define the fields as attributes in our Sitecore Search domain (and we will need to push all field marked as required):

In this post we will use the name, description, type, image_url and url – but we could use other attributes or create new ones to suit our needs.

1. Setting up a source

As we saw in my previous blog post, content items are added to the Sitecore Search index using Sources. So to be able to use the Ingestion API, we first need to setup a Source of the type API Push:

You will notice that my source has been named sc103xm, and the Source Information includes an Endpoint URL, a Source ID and an API key.

Also note the Available Locales in the bottom of the screen. In a multi language setup we could potentially push documents to different locales, but in the example below we will simply use en_us.

After setting up the source, we will be able to call the following endpoint for creating documents (please not the base URL might differ):

POST https://discover-apse2.sitecorecloud.io/ingestion/
v1/
domains/[DOMAIN ID]/
sources/[SOURCE ID]/
entities/[ENTITY ID]/documents?locale=[locale]

And for updating documents:

PATCH https://discover-apse2.sitecorecloud.io/ingestion/
v1/
domains/[DOMAIN ID]/
sources/[SOURCE ID]/
entities/[ENTITY ID]/
documents/[id]?locale=[locale]

The Domain ID can be found under Administration > Domain Settings and the rest of the information are available from the screen above (we will be using them later). To be honest I am a bit unsure about the Entity ID – I will simply be using content.

With these changes in place, we are ready to start pushing content into our Sitecore Search index.

2. Integrating with Sitecore XM

When integrating Sitecore Search with Sitecore XM one option is to push content items into the Ingestion API whenever content is published. There is a number of ways to do this, but this is one suggestion:

Let us start by defining a base template in Sitecore called _Searchable:

The template will serve two purposes: First, it will work as a filter as we will only push items implementing this template into Sitecore Search. Secondly it contains the fields we will need when calling the Ingestion API.

In our code we will need a Document class to serialize and send to the Ingestion API:

namespace sc10xm.Features.Search.Models
{
    using Newtonsoft.Json;

    public class Document
    {
        [JsonProperty("document")]
        public InnerDocument InnerDocument { get; set; }
    }

    public class InnerDocument
    {
        [JsonProperty("id")]
        public string ID { get; set; }

        [JsonProperty("fields")]
        public DocumentFields Fields { get; set; }

        [JsonIgnore]
        public string Locale { get; set; }
    }

    public class DocumentFields
    {
        [JsonProperty("name")]
        public string Name { get; set; }

        [JsonProperty("description")]
        public string Description { get; set; }

        [JsonProperty("type")]
        public string Type { get; set; }

        [JsonProperty("image_url")]
        public string ImageUrl { get; set; }

        [JsonProperty("url")]
        public string Url { get; set; }
    }
}

As you can see, I am using a structure with an InnerDocument to get the schema right. I have also added the locale to the InnerDocument although we will not be pushing it as part of the payload.

To handle the URLs, I have introduced an IngestionApiHelper. In a real solution we would of cause not have the base URL, Domain ID, Source ID and Entity ID hardcoded, but for readability lets you keep them inside the helper:

namespace sc10xm.Features.Search.Helpers
{
    using sc10xm.Features.Search.Models;

    public static class IngestionApiHelper
    {
        private static string baseUrl = "https://discover-apse2.sitecorecloud.io/ingestion/v1/";
        private static string patchEndpoint = baseUrl + "domains/{0}/sources/{1}/entities/{2}/documents/{3}?locale={4}";
        private static string postEndpoint = baseUrl + "domains/{0}/sources/{1}/entities/{2}/documents?locale={3}";
        private static string domainId = "[DOMAIN ID]";
        private static string sourceId = "[SOURCE ID]";
        private static string entityId = "[ENTITY ID]";

        public static string GetPatchUrl(Document document)
        {
            return string.Format(
                patchEndpoint,
                domainId,
                sourceId,
                entityId,
                document.InnerDocument.ID,
                document.InnerDocument.Locale
            );
        }

        public static string GetPostUrl(Document document)
        {
            return string.Format(
                postEndpoint,
                domainId,
                sourceId,
                entityId,
                document.InnerDocument.Locale
            );
        }
    }
}

Next, let us introduce some extension methods – in a real solution we might want to put this functionality into services, but to keep everything simple, here we go:

To the HttpClient we will add the following three extension methods. This will allow us to POST and PATCH documents without worrying about serialization:

namespace sc10xm.Feature.Search.Extensions
{
    using Newtonsoft.Json;
    using sc10xm.Features.Search.Models;
    using System.Net.Http;
    using System.Text;
    using System.Threading.Tasks;

    public static class HttpClientExtensions
    {
        public static Task<HttpResponseMessage> PatchAsync(
            this HttpClient httpClient,
            string requestUri,
            HttpContent content)
        {
            var request = new HttpRequestMessage(new HttpMethod("PATCH"), requestUri);
            request.Content = content;
            return httpClient.SendAsync(request);
        }

        public static Task<HttpResponseMessage> PatchDocumentAsync(
            this HttpClient httpClient,
            string requestUri,
            Document document)
        {
            var payloadJson = JsonConvert.SerializeObject(document);

            return httpClient.PatchAsync(
            requestUri,
            new StringContent(
                payloadJson,
                Encoding.UTF8,
                "application/json")
            );
        }

        public static Task<HttpResponseMessage> PostDocumentAsync(
            this HttpClient httpClient,
            string requestUri,
            Document document)
        {
            var payloadJson = JsonConvert.SerializeObject(document);

            return httpClient.PostAsync(
            requestUri,
            new StringContent(
                payloadJson,
                Encoding.UTF8,
                "application/json")
            );
        }
    }
}

We will also add this extension method to the Publisher. This will allow us get all published items matching a predicate:

namespace sc10xm.Features.Search.Extensions
{
    using Sitecore.Data.Items;
    using Sitecore.Publishing;
    using System;
    using System.Collections.Generic;
    using System.Linq;

    public static class PublisherExtensions
    {
        public static IEnumerable<Item> GetPublishedItems(
            this Publisher publisher, 
            Func<Item, bool> predicate)
        {
            var languages = publisher.Languages;
            var database = publisher.Options.SourceDatabase;
            var rootItem = publisher.Options.RootItem;

            var items = new List<Item>();

            items.AddRange(languages.Select(
                language => database.GetItem(rootItem.ID, language)).
                Where(predicate));

            if (publisher.Options.Deep)
            {
                foreach (var child in rootItem.Axes.GetDescendants())
                {
                    items.AddRange(languages.
                        Select(language => database.GetItem(child.ID, language)).
                        Where(predicate));
                }
            }

            return items;
        }
    }
}

Finally, we need a way to map an Item to a Document. I will do this in an extension method. You will notice that I am getting the full URLs for the item and image using the LinkManager. In a real solution this probably have to change as it will produce the URL for the CM server, but for now let us keep it simple:

namespace sc10xm.Features.Search.Extensions
{
    using sc10xm.Features.Search.Models;
    using Sitecore.Data.Items;
    using Sitecore.Links.UrlBuilders;
    using Sitecore.Links;
    using Sitecore.Resources.Media;
    using Sitecore.Data.Fields;

    public static class ItemExtensions
    {
        public static Document ToDocument(this Item item, string locale)
        {
            var innerDocument = new InnerDocument();

            innerDocument.ID = item.ID.ToGuid().ToString("D");

            ImageField imageField = item.Fields["Search Image"];

            string imageUrl = null;

            if (imageField?.MediaItem != null)
            {
                var image = new MediaItem(imageField.MediaItem);
                MediaUrlBuilderOptions mediaUrlOptions = new MediaUrlBuilderOptions();
                mediaUrlOptions.AlwaysIncludeServerUrl = true;
                imageUrl = MediaManager.GetMediaUrl(image, mediaUrlOptions);
            }

            innerDocument.Fields = new DocumentFields()
            {
                Name = item.Fields["Search Title"]?.Value,
                Description = item.Fields["Search Description"]?.Value,
                Type = "sc103xm",
                ImageUrl = imageUrl,
                Url = LinkManager.GetItemUrl(
                    item,
                    new ItemUrlBuilderOptions
                    {
                        AlwaysIncludeServerUrl = true,
                        Site = Sitecore.Sites.SiteContext.GetSite("website")
                    }
                )
            };

            innerDocument.Locale = locale;

            return new Document() { InnerDocument = innerDocument};
        }
    }
}

With these steps in place we are ready for the actual event handler. For simplicity will simply keep it together with the initialize pipeline processor:

namespace sc10xm.Features.Search.Pipelines
{
    using sc10xm.Feature.Search.Extensions;
    using sc10xm.Features.Search.Extensions;
    using sc10xm.Features.Search.Helpers;
    using Sitecore.Data.Items;
    using Sitecore.Data.Managers;
    using Sitecore.Events;
    using Sitecore.Pipelines;
    using Sitecore.Publishing;
    using System;
    using System.Collections.Generic;
    using System.Net;
    using System.Net.Http;

    public class InitializeSearchEvents
    {
        private static string apiKey = "[API KEY]";

        // Mapping from Sitecore languages to Sitecore Search locales
        private static Dictionary<string, string> sourceLocales = 
            new Dictionary<string, string>()
        {
            { "en", "en_us" }
        };

        public void Process(PipelineArgs args)
        {
            Event.Subscribe("publish:end", this.OnPublishEnd);
        }

        public void OnPublishEnd(object sender, EventArgs args)
        {
            var sitecoreEventArgs = args as SitecoreEventArgs;

            if (sitecoreEventArgs == null || sitecoreEventArgs.Parameters.Length < 1)
                return;

            var publisher = sitecoreEventArgs.Parameters[0] as Publisher;

            if (publisher == null)
                return;

            Func<Item, bool> predicate = (Item item) =>
            {
                if (item == null)
                    return false;
                if (item.Versions.Count < 1)
                    return false;
                if (!sourceLocales.ContainsKey(item.Language.Name))
                    return false;
                if (!TemplateManager.GetTemplate(item).InheritsFrom("_Searchable"))
                    return false;

                return true;
            };

            var items = publisher.GetPublishedItems(predicate);

            var httpClient = new HttpClient();

            httpClient.DefaultRequestHeaders.Add("Authorization", apiKey);

            foreach (var item in items)
            {
                var locale = sourceLocales[item.Language.Name];
                
                // Generate the payload
                var document = item.ToDocument(locale);

                var patchUrl = IngestionApiHelper.GetPatchUrl(document);

                var resultCode = httpClient.PatchDocumentAsync(patchUrl, document).
                    Result?.StatusCode;
                
                // If document is not found, create it instead
                if (resultCode == HttpStatusCode.BadRequest)
                {
                    var postUrl = IngestionApiHelper.GetPostUrl(document);

                    httpClient.PostDocumentAsync(patchUrl, document);
                }
            }
        }
    }
}

You will notice that I am first trying to PATCH the document. If the document does not exist, I get a BadRequest status code back and will use POST the document. If you wonder, why I am not using the PUT endpoint the honest answer is: I cannot seem to get it to work?

Finally, we will patch in our processor in the initialize pipeline:

<configuration>
  <sitecore>
    <pipelines>
      <initialize>
        <processor type="sc10xm.Features.Search.Pipelines.InitializeSearchEvents, sc10xm"/>
      </initialize>
    </pipelines>
  </sitecore>
</configuration>
</pre>

Having set everything up (hopefully correctly), we will now go back into Sitecore. Both the Home and the Content Page items implements the _Searchable template, so I will simply try to publish the Home item with children:

As the ingesting happens asynchronously, I will give it a few moments before going back into Sitecore Search. And voila – our Content Collection has been updated:

And we can even see the details of each document:

A final note

The solution presented above is of cause not production ready. And since Sitecore Search and Sitecore XM are both Sitecore products there might even be a connector to avoid having to implement this integration. However, I hope you found the walkthrough interesting and it does give a good starting point for start using the Ingestion API.