Database shards in Sitecore – how to get the data from them?

Based on the title – the topic seems to be pretty simple as Sitecore has prepared really great documentation about the shards but in this article I will try to describe scenario when we want to get the data from shards with direct connection to the database instead of using Sitecore’s libraries like xConnect etc.

As an example I will show you how to get the latests interactions saved into database. Usually to do that you need to know the contact id and query for the data via xConnect but not this time. I will explain everything very soon.

Documetation

Sitecore team has prepared few documetation pages describing how the data from shards is consumed.

Take a look at this article to get the overview od how the data is taken from each of the shards: https://doc.sitecore.com/en/developers/90/sitecore-experience-platform/data-extraction.html
You should definietly check it because it will allow you to understand later why the number of results can be multiplied by the number of the shards.

Take a look at this article to see how to get the contacts and their interactions from the shards:
https://doc.sitecore.com/en/developers/90/sitecore-experience-platform/extracting-contacts-and-interactions.html
Under that link you will notice that to get any interactions you must have a contact reference.

The challenge

Now, imagine that you do not have contact id or you just do not care about the contact data because you want to get the latest interactions that happened with the website – like goals or page visits. How to get that data without having contact’s reference?

The solution

To start we must use database connection – in our case it will be connection to the xdb collection database.

Then to work with the shards we are going to use two libraries:

  • Microsoft.Azure.SqlDatabase.ElasticScale.Query
  • Microsoft.Azure.SqlDatabase.ElasticScale.ShardManagement

These libraries have been developed for Azure but they still perfectly work on every local environment with shards.

Frist of all we must load the shards :

ShardMap = myShardMap;
var shardMapManager = ShardMapManagerFactory.GetSqlShardMapManager(...your connection string..., ShardMapManagerLoadPolicy.Lazy);
shardMapManager.TryGetShardMap("ContactIdShardMap", out myShardMap)

With that code we are loading shards configuration into the ‘map’ object.

Sitecore comes out of the box with three shards:

  • Device Profile Shard (DeviceProfileIdShardMap)
  • Contact Shard (ContactIdShardMap)
  • Contact Identifiers Shard (ContactIdentifiersIndexShardMap)

More about Sitecore’s shards you can check for instance here: https://doc.sitecore.com/en/developers/93/platform-administration-and-architecture/split-or-merge-xdb-collection-database-shards.html

When we have loaded shard map we can start query for the data – to connect to shard we must use sql connection string builder and then with that builder we can connect to the ‘part’s of the shard’.

var builder = new SqlConnectionStringBuilder(...your connection string...);
using (var conn = new MultiShardConnection(myShardMap.GetShards(), $"User ID={builder.UserID};Password={builder.Password};"))
{
    using (MultiShardCommand cmd = conn.CreateCommand())
    {
        cmd.CommandText = "SELECT * FROM [xdb_collection].[Interactions]";
        cmd.CommandType = CommandType.Text;
        cmd.ExecutionOptions = MultiShardExecutionOptions.IncludeShardNameColumn;
        cmd.ExecutionPolicy = MultiShardExecutionPolicy.PartialResults;
        using (MultiShardDataReader sdr = cmd.ExecuteReader()) 
        {
            while (sdr.Read())
            {
                var interactionId = sdr.GetGuid(0);
                var lastModified = sdr.GetDateTime(1);
                .........
                var campaignId = (sdr[10] as Guid?).Equals(null) ? null sdr.GetFieldValue<Guid?>(10);
                .....
            }
        }
    }
}

Why did I write about the parts of the shard when there is no loop iterating through the shards? Well, all that happens inside the Azure’s library. It means that depends on the number of the parts we will call the same query as many times as many shards we have. Because of that it will be much harder to maintain the number of results that we want to receive.

If we want to limit our results to 100 in fact the final result can be 200 because limit of 100 will be applied for every query called to every shard part. Despite that – it is just like a normal SQL query and it can be very helpful from time to time.

Summary

In general it is worth to member that this solution belongs to the ‘hacks’ category and if it is not required it is much better to use Sitecore’s library to work with xConnect models in the safe and well documented way. If you already tried to use it and it is not something that solves your issues then I hope that the concept of accessing database shards will save your life like it saved my – at least once so far. Good luck!

What to use to write Unit Tests for Sitecore?

In this post, I would like to share with you some experiences I had in my last project and give you advice on what you can use to write Unit Tests and why sometimes some of the libraries are not a good way to go.

In the most cases when the team is thinking to start writing unit tests there is an discussion about different approaches and available libraries – been there done that.

Let me tell you my story

In our case, we had the following libraries taken into consideration:

  • NUnit vs XUnit
  • NSubstitute vs Moq
  • Asserts vs FluentAssertions
  • FakeDB

We totally forgot about AutoFixture and that was our mistake!

After couple of meetings (we had several teams that had to agree something) we decided to vote and the winners were:

  • NUnit
  • Moq
  • FluentAssertions
  • FakeDB

Of course, you may ask “why” … the answer is usually simple because team members knew these three or just liked to write unit tests with the support of these libraries than others.

Is this logical – in some way yes. Is it professional – some people may say ‘no’ but this is how real-world works. Software developers tend to use libraries that they know – especially when it comes to writing unit tests – even when there are better solutions out there.

Everything was fine at the beginning…

We started to write tests, we had around 100 test cases and then we realized that we need something to generate fake Sitecore items automatically.

We started to play we AutoFixture library and then we realized that we made a mistake …

AutoFixture was working really good with all of the libraries instead of FakeDB. We were able to generate simple types like strings etc. but no items were added to the FakeDB.

After some digging in google I found out that many people had that issue and there are two repeated solutions:

  • switch into XUnit
  • get rid of FakeDB and prepare your own fake objects

For us was too late – we decided to stay with our setup.

What to use – the answer

Based on my experience I would not start writing unit tests for Sitecore with NUnit in the future.

The rest of the listed libraries seem to be pretty safe and good from my point of view.

So the answer is:

Use XUnit and whatever else you want – but not NUnit because it can block you at some point as blocked me.

Sitecore SPEAK DatePicker issue with date selection

In this post I describe how to fix/workaround the issue with DatePicker component in Sitecore SPEAK 2.

Usually, DatePicker works just fine but at some point we noticed that something wrong is happening with date selection …

I identified two issues:

  • when we click the date in calendar it selects today rather than selected date
  • when we click the date in calendar it shows that we selected the date but value is set for today’s date not one that is shown

It might be related to the fact that I had to DatePicker components on a single page. Also, it might be related to the custom format of data that we use in the DatePicker – dd/MM/yyyy.

After some checks I decided to overwrite the vanilla DatePicker javascript with changed one.

Javascript file is in the path:

/sitecore/shell/client/Business Component Library/version 2/Layouts/Renderings/Common/DatePickers/DatePicker.js

I fixed it by adding ‘global’ variable to store set value: “definedDate”

Then I use that variable in two places.

First one is formatDate function that is called usually after selection

Second one is inside getISODate where we use it to calculate correct date

That change solves the incorrect behavior of the form.

Sitecore’s Personalization under the hood

About two months ago I had a pleasure to conduct webinar about personalization but this time I focused on something that people usually skip.

I did not describe how to setup the personalization rules but I tried to explain

  • what does it mean to apply a good personalization
  • why it is beneficial to apply personalization (for customers and companies)
  • how it works under the hood (algorithms and calucations process)

If you are interested in details you can watch it here (click image – you must to enter your email in the form to see the video):

You can also check details of presentation on slideshare:

I hope you like it – if you want to discuss it, please use comments section or reach me via social media.

How to get the latest logs from Azure with Kudu

Before I will tell you how I want to describe to you why someone may need to use Kudu to get logs from Azure.

Of course, the best way to work with Sitecore logs (when it is installed as PaaS) is the Application Insights tool. But when it has to be disabled because of many different reasons we are forced to use good old files with logs.

An issue with log files is the fact that when a file is in use we cannot just download it from the server with FTP client because it is locked by IIS process. To access the latest data we must use Kudu tool – but it is not the most efficient way to find and open log file – especially when you have thousands of log files and Kudu does not allow us to sort the files by date to get that one you need.

How to find in Kudu lately updated files?

Let’s start with the opening of PowerShell tool in Kudu

Then by clicking or with command enter the directory with logs files.

cd "D:\home\site\wwwroot\App_Data\logs"

As a result, you will see the list of files with pagination – it means that in case of having thousands of files – to find the file is a nightmare. Thanks to PowerShell console, we can find the name of a file very quickly.

Run the following command to get files modified during last hour:

Get-ChildItem | Where { $_.LastWriteTime -gt (Get-Date).AddHours(-1) }

(you can run two commands at once, just use a semicolon between two of them)

As a result, you will see in a PS console a list of files names:

Now, to get the file you have to use Kudu’s URL (copy it from the browser) – get only a domain, then add the following path to it, with name of your file at the end:

/api/vfs/site/wwwroot/App_Data/logs/nameOfYourFile

In total you should receive something like this:

https://domainWithSCMinIT/api/vfs/site/wwwroot/App_Data/logs/nameOfYourFile.txt

Open it in the browser to get the latest data from the currently used log file.