Setting up Rainbow SFS for Unicorn in Sitecore

When multiple developers work on the same Sitecore solution, it is necessary to have mechanisms for sharing an evolving solution across a number of local installations. For the code files, mature version control system like Git efficiently track changes and synchronize these between computers. Conflicts between changes are handled by tested-and-proven workflows known and understood by developers.

But the content stored by Sitecore in a number of SQL databases often needs to be shared as well: Of course, an option is simply to have a single SQL server, used by all developers. This might work for small projects, but a shared database does not present the same capabilities as e.g. Git: Developers cannot work offline, and changes cannot be tracked easily. If conflicts arise, content will be overwritten, and the code files and the content will easily come out of sync.

The solution for many development teams is to let the version control system handle the content by serializing shared content into ‘code’ files. This bundles code and content together and make the version control system the single source of truth.

While Sitecore offers the ability of serialize and deserialize content, most development teams use Unicorn, a tool specially written for sharing Sitecore content across installations via a version control system like Git.

Internally, Unicorn uses the Rainbow Serialization File System (SFS) to write down selected parts of the Sitecore content tree into ‘code’ files using the yaml file format (*.yml). Content can be serialized and deserialized automatically and the resulting yaml files can be handled like any other code. Each developer has a set to Sitecore databases running locally, but for shared content, these databases simply reflects the shared yaml files.

The Rainbow SFS is a pretty robust piece of software, and normally it runs unnoticed, keeping the serialized version of the Sitecore content consistent and in sync. However, it can be corrupted, especially when merging different sets of content. When such corruption occurs, many developers opt for a simple solution: Delete all yaml files, and let Unicorn create new ones based on the content of that particular developer’s database. While this certainly creates a consistent SFS, it is also a drastic approach that usually creates lot of unnecessary file changes when the reserialized content is pushed to the version control system. I usually go for a more isolated approach, figuring out what went wrong, and mitigate that problem only.

This, however, requires a rather detailed knowledge of how Rainbow structures the SFS. So, in this post, I will explain how Rainbow creates the SFS, and present a validator I recently wrote for a Sitecore Habitat-like solution.

Mapping the Sitecore content tree to the filesystem

The Rainbow SFS keeps its yaml files in a hierarchy of folders and aims to map the Sitecore content tree as accurate as possible, using item names as file names for the yaml files:

The Sitecore content tree (left) and the resulting SFS (right):

There are, however, some key differences between a filesystem and the Sitecore content tree: Oblivious, files in most filesystems cannot contain subfiles, so Rainbow uses a combination of identically named folders and files for items containing subitems. But Windows also impose limits to the path length as well as reserving specific characters and file names to maintain backward compatibility. You can find are detailed list of these limitations here.

While the exact limitations imposed by Windows are somewhat complicated, in Rainbow these limitations are implemented using the following simple rules:

No complete file path can exceed 240 characters.
The characters defined in Path.GetInvalidFileNameChars will be replaced by “_” in file as well as folder names.

Beside these hardcoded rules, a number of rules are configurable:

File names exceeding the Rainbow.SFS.MaxItemNameLengthBeforeTruncation setting will be truncated. If the truncated file name ends with a space, this will be replaced by a “_”. The default value is 30.
Filenames included in the Rainbow.SFS.InvalidFilenames setting will be prefixed with a “_”. The default value includes the file names reserved by Windows.

Safe file names

Rainbow have to handle duplicate filenames as Sitecore allowed identically named sibling. The truncation of filename also risk introducing duplicates if e.g. item names differs only beyond the 30^th character. Therefore, Rainbow can opt to use a safe name: The ordinary file name calculated using the rules above and post fixed with a “_” and the entire item ID without curly brackets. This means that if item1.yml already exists, an additional item1 will be serialized as:

item1_[item1 ID].yaml

Notice that safe file names hence risk exceeding the Rainbow.SFS.MaxItemNameLengthBeforeTruncation with up to 37 characters.

Wrapped folders

If file paths exceed the 240-character limit, Rainbow will serialize an item into a folder placed in the SFS root folder and named after the items parent ID without brackets. This wraps the file path around, starting from the root again. Hence:

/item1/item2/item3 … /itemX/itemY

will be serialized as:

/[itemX ID]/itemY.yml

When serializing further subitems, this wrapped around folder will be reused, as the parent’s filename (without extension) will always form the path for subitems, hence:

/item1/item2/item3 … /itemX/itemY/itemZ

will be serialized as:

/[itemX ID]/itemY/itemZ.yml

If the 240-character limit is exceeded again, Rainbow will wrap the path again.

Setting up the root path

Until now, we have seen how Rainbow structures its files in relation to its root path. The actual root paths are configured by Unicorn in a number of configurations. Each Unicorn configuration defines a number of SFSs with a shared physical root path defined in the targetDataStore element:

&lt;unicorn>
    &lt;configurations>
        &lt;configuration name="configuration1">
            &lt;targetDataStore physicalRootPath="c:\root1" />
        &lt;/configuration>
    &lt;/configurations>
&lt;/unicorn>

Each Unicorn configuration contains a number of predicates. In this post we will only look at the includes within the predicate element with maps part of the Sitecore content tree to folders within the root folder.

&lt;unicorn>
    &lt;configurations>
        &lt;configuration name="configuration1">
            &lt;targetDataStore physicalRootPath="c:\root1" />
            &lt;predicate>
                &lt;include name="home1" database="master" 
                path="/sitecore/content/Home1" />
                &lt;include name="home2" database="master" 
                path="/sitecore/content/Home2" />
            &lt;/predicate>
        &lt;/configuration>
    &lt;/configurations>
&lt;/unicorn>

Note that each include element actually specify an isolated SFS, kept in a folder beneath the physical root path, so the physical root path is not the SFS root path – it is merely a container of SFSs keep together by Unicorn.

The configuration above will create two folders withing the physical root and will serialize the Home1 and Home2 items into the respective folders. Hence the c:\root1\home1 folder will contain a Home1.yml file and a Home1 folder containing any subitems. Note that any wrapped folders will be placed inside the home1 folder.

Reserving characters for the SFS root path

In the example above, the physical root path is defined using an absolute path. If is however possible to define the physical root path using relative paths (for the different options, see the default Unicorn.config), resulting in different SFS root path lengths on different environments, which can result in inconsistent use of wrapped folders, as the 240 character can be reached on some installations and not on others. This can be mitigated by reserving a fixed set of characters for the SFS root path using the Rainbow.SFS.SerializationFolderPathMaxLength setting. When calculating the path length for a specific item, Rainbow will always use this setting as the SFS root folder path length, and not the actual value. If an installation uses a root path that exceeds this limit, Rainbow will throw an error, ensuring consistent use of wrapped folders.

Validating the SFS

I recently wrote a validator based on the Habitat project. The idea behind the validator is quite simple: Given a solution based on the Habitat project structure, it will validate the SFS created by each project.

The validator is available at GitHub: https://github.com/kristofferkjeldby/YamlValidator

Note that as I wanted to keep the validator as independent of the actual project as possible, so I opted not to read the Unicorn configuration files and based my validation on the serialized files alone. The name of the validator might also be a bit misleading, as the yaml files themselves are not really being validated – my main concern is the structure of the SFS, that is the structure created by sets of yaml files.

The Habitat solution contains a number of projects ordered within the Helix layers Foundation, Feature and Project. Each project contains a single Unicorn configuration containing a number of includes within the predicate element.

This structure is parsed into a hierarchy of the following classes before the actual validation takes place:

Solution → SfsFilesystem
Project (Unicorn configuration) → SfsProject
Predicate include (Actual SFS) → SfsPredicate
Yaml file → SfsItem

The tool then runs a number of analyzers on the parsed content tree:

SerializationFolderAnalyzer: This analyzer simple checks that no SFS root path exceeds the Rainbow.SFS.SerializationFolderPathMaxLength. It is mostly a sanity check, as further validation of e.g. file names and paths will depend on this.

PredicateRootAnalyzer: This analyzer determines the Sitecore root item of each predicate include. As I am not reading the Unicorn configurations, I will have to resolve this item by looking at the root yaml file within each includes folder. While this is possible, I admit that this is probably not the most elegant solution, and if you choose to use my validator on an actual project, you might want to resolve these root items by looking at the actual Unicorn configurations instead.

DuplicateIdAnalyser: The entire SfSFilesystem is checked for duplicate items, that is yaml files with identical Sitecore IDs. While technical each SfsPredicate contains an isolated SFS, cross SfsPredicate duplicate ids indicate that the same item is being serialized by multiple Unicorn predicate includes , with is probably not intentional. I any case, duplicate IDs within a single SfsPredicate is a serious corruption that will prevent Unicorn from deserializing the SFS.

PredicateCollisionAnalyzer: This analyzer checks whether multiple predicate includes serializes the same items. Again, while this is technical not a problem, it is most likely not intentional, as the actual items being used will be determined by the order in with the Unicorn configurations are deserialized.

DuplicateItemPathsAnalyser: While duplicate item names are supported by Sitecore, it is generally something we avoid. This analyzer will add a warning to items having the same path in Sitecore, as it might indicate that Unicorn actual stores two versions of the same item having different Sitecore ID. If your project uses duplicate item names, you could disable this analyzer.

FilePathAnalyzer: This analyzer will validate the structure of each SfsPredicate, doing a mock serialization and match the resulting file names with the actual file names. As the SfsPredicate stores its SfsItems in a flat list, it will first parse the items into a tree structure (detecting sparse trees), then calculating expected the file names, including safe file names and wrapped folders. The trickiest part are the safe file names – that is, given a number of colliding file names, which one should use the ordinary file name, and which one should use the safe file name: When Unicorn creates a SFS from scratch (e.g. by the reserialization option in the Unicorn control panel), the first item serialized will always use the ordinary file name, and later files will use the safe file name it they ordinary file name collide with already serialized files. If an item is later serialized, the filename (ordinary or safe) will be maintained – even if the item having using the ordinary file name have been removed. This will result in a set of safe file name without the ordinary file name actually being used.

This left me with an option – should my mock serialization mimic a serialization-from-scratch or a serialization of individual item given an already existing SFS. I opted for the late option and reused the logic from Rainbow enacted when serializing single items. I believe that this logic (YamlValidator.Analyzers.FilePathAnalyzer.ResolveExpectedFilePath) is the most interesting/confusing part of the validator and of the Rainbow SFS logic in general as it explains why a serialization of an entire SFS from scratch and the serialization of individual files can result on different sets of file names.

Some final thoughts

I hope that this tour-de-force into the inner workings of the Rainbow SFS has been enlightening.
Above all, you should notice these three points:

As long as your SFS root path(s) are kept below the limit imposed by the Rainbow.SFS.SerializationFolderPathMaxLength, you and your teammates can place your project in any location locally without creating inconsistencies in the SFS structure.
The SFS can become unhealthy even if Unicorn does actually manage to serialize and deserialize the items. Yaml files can use incorrect file names, and multiple SFS can contain the same items, giving rise to subtle errors during serialization and deserialization.
The logic behind the usage of safe file names is dependent on any existing yaml files and can be unpredictable and dependent on the order in with the items within an SFS is serialized.

The yaml validator is provided as-is with no strings attached.