Database shards in Sitecore – how to get the data from them?

Based on the title – the topic seems to be pretty simple as Sitecore has prepared really great documentation about the shards but in this article I will try to describe scenario when we want to get the data from shards with direct connection to the database instead of using Sitecore’s libraries like xConnect etc.

As an example I will show you how to get the latests interactions saved into database. Usually to do that you need to know the contact id and query for the data via xConnect but not this time. I will explain everything very soon.

Documetation

Sitecore team has prepared few documetation pages describing how the data from shards is consumed.

Take a look at this article to get the overview od how the data is taken from each of the shards: https://doc.sitecore.com/en/developers/90/sitecore-experience-platform/data-extraction.html
You should definietly check it because it will allow you to understand later why the number of results can be multiplied by the number of the shards.

Take a look at this article to see how to get the contacts and their interactions from the shards:
https://doc.sitecore.com/en/developers/90/sitecore-experience-platform/extracting-contacts-and-interactions.html
Under that link you will notice that to get any interactions you must have a contact reference.

The challenge

Now, imagine that you do not have contact id or you just do not care about the contact data because you want to get the latest interactions that happened with the website – like goals or page visits. How to get that data without having contact’s reference?

The solution

To start we must use database connection – in our case it will be connection to the xdb collection database.

Then to work with the shards we are going to use two libraries:

  • Microsoft.Azure.SqlDatabase.ElasticScale.Query
  • Microsoft.Azure.SqlDatabase.ElasticScale.ShardManagement

These libraries have been developed for Azure but they still perfectly work on every local environment with shards.

Frist of all we must load the shards :

ShardMap = myShardMap;
var shardMapManager = ShardMapManagerFactory.GetSqlShardMapManager(...your connection string..., ShardMapManagerLoadPolicy.Lazy);
shardMapManager.TryGetShardMap("ContactIdShardMap", out myShardMap)

With that code we are loading shards configuration into the ‘map’ object.

Sitecore comes out of the box with three shards:

  • Device Profile Shard (DeviceProfileIdShardMap)
  • Contact Shard (ContactIdShardMap)
  • Contact Identifiers Shard (ContactIdentifiersIndexShardMap)

More about Sitecore’s shards you can check for instance here: https://doc.sitecore.com/en/developers/93/platform-administration-and-architecture/split-or-merge-xdb-collection-database-shards.html

When we have loaded shard map we can start query for the data – to connect to shard we must use sql connection string builder and then with that builder we can connect to the ‘part’s of the shard’.

var builder = new SqlConnectionStringBuilder(...your connection string...);
using (var conn = new MultiShardConnection(myShardMap.GetShards(), $"User ID={builder.UserID};Password={builder.Password};"))
{
    using (MultiShardCommand cmd = conn.CreateCommand())
    {
        cmd.CommandText = "SELECT * FROM [xdb_collection].[Interactions]";
        cmd.CommandType = CommandType.Text;
        cmd.ExecutionOptions = MultiShardExecutionOptions.IncludeShardNameColumn;
        cmd.ExecutionPolicy = MultiShardExecutionPolicy.PartialResults;
        using (MultiShardDataReader sdr = cmd.ExecuteReader()) 
        {
            while (sdr.Read())
            {
                var interactionId = sdr.GetGuid(0);
                var lastModified = sdr.GetDateTime(1);
                .........
                var campaignId = (sdr[10] as Guid?).Equals(null) ? null sdr.GetFieldValue<Guid?>(10);
                .....
            }
        }
    }
}

Why did I write about the parts of the shard when there is no loop iterating through the shards? Well, all that happens inside the Azure’s library. It means that depends on the number of the parts we will call the same query as many times as many shards we have. Because of that it will be much harder to maintain the number of results that we want to receive.

If we want to limit our results to 100 in fact the final result can be 200 because limit of 100 will be applied for every query called to every shard part. Despite that – it is just like a normal SQL query and it can be very helpful from time to time.

Summary

In general it is worth to member that this solution belongs to the ‘hacks’ category and if it is not required it is much better to use Sitecore’s library to work with xConnect models in the safe and well documented way. If you already tried to use it and it is not something that solves your issues then I hope that the concept of accessing database shards will save your life like it saved my – at least once so far. Good luck!

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments