Translating content in Sitecore

November 17, 2021, Kristoffer Brinch Kjeldby

 

We often create large multilingual websites based on Sitecore. Solutions that often involve translation of large amounts of content done by us, the client or a translation agency. Finding a good routine for translating content can be challenging and while translating directly in Sitecore is an option, professional translation often involves dedicated translation software where content is imported and translated in a more systematical way.

To facilitate this, Sitecore includes a feature for exporting and importing language-specific content via the Control Panel. Unfortunately, the feature is somewhat underdeveloped and includes some oddities. Also, the native file format used by Sitecore is fixed and may not be supported by third-party translation tools.

In this post I will first take a look at the format used by Sitecore and show you how language files can be handled and transformed programmatically. Then I will show you how the language file format can be completely overridden directly in Sitecore.

The Sitecore language file

First thing you need to know is: A Sitecore language file is simply an XML file containing a list of field values. Each field is represented by an XML node, called a phrase, with values listed as elements:

 

Example:

<phrase path="/sitecore/content/Home" key="Home" itemid="{22222222–2222–2222–2222–222222222222}" fieldid="Title" updated="20190101T010101Z">
    <da>Titel</da>
    <en>Title</en>
</phrase>

 

The pathkeyitemid, and updated attributes all refer to the item containing the field — and the field is identified by the fieldid attribute. Each language is included as an element. Elements contain the raw value of the field as stored by Sitecore. Shared field and field with default or standard values are not included in the language file.

Exporting language files

The second thing you need to know is this: You export a language file from the Control Panel recursively from a single root node. It is not possible to export content from different parts of the content tree into a single language file, although, you can manually combine language files by simply copying all the phrases from one file to another. The order of the phrases has no significance and you can remove some phrases and keep others.

The list of language elements for each phrase includes only the language elements chosen during export. If an item does not exist on all the exported languages its phrases will only include the languages where it actually exists. The same happens if one language version uses standard value and another does not. So, it is entirely possible to have a language file where phrases with e.g. Danish and English are intermixed with phrases containing only English.

Importing language files

When you try to import a language file containing multiple languages you may come across a rather confusing oddity: During import, Sitecore presents you with a list of languages to import. But this list is compiled from the first phrase in the language file and does not consider other phrases in the file.

This means that you sometimes have to move a phrase, in the language you wish to import, to the top of the file in order to get Sitecore to recognize the language as present in the language file.

Limitations of the language file format

The language file format has no notion of versions and only contains the latest version of an item’s fields. Sitecore does not use the updated attribute to identify a specific version but simply updates the most recent version.

Also, as you might notice, the language file identifies the fields by field name and not field id — this means that language file format does not support duplicate field names on an item, whereas other parts of Sitecore do (although, generally not recommended). In fact, duplicate field names will mess up the language file. So, if you have a duplicate field named Title on the Home item, it will be exported as:

Example:

<phrase path="/sitecore/content/Home" key="Home" itemid="{22222222–2222–2222–2222–222222222222}" fieldid="Title" updated="20190101T010101Z">
    <da>Titel 1</da>
    <da>Titel 2</da>
</phrase>

The two separate fields are combined into a single phrase and although the order of the language elements is fixed it is not necessarily the order of the fields. If you try to import such a phrase, the first Title field is updated twice while the second is left untouched — so, even though you keep the language elements sorted the import will not work.

Handling language files programmatically

But how can you make it work? We have developed a suite of utilities to handle language files by e.g. combining, editing and filtering files. A phrase is represented by this simple class:

Example:

public class Phrase
{
    public Phrase()
    {
        Languages = new Dictionary<string, string>();
    }    public string Updated { get; set; }
    public string FieldId { get; set; }
    public string ItemId { get; set; }
    public string Key { get; set; }
    public string Path { get; set; }
    public IDictionary<string, string> Languages { get; set; }
    public override string ToString()
    {
        return $"{$Path/$FieldId}";
    }
}

A language file is simply a List<Phrase>. To load a language file, we import the XML file and generate a list of phrases (using System.Xml.Linq):

Example:

public static List<Phrase> Read(
    Stream stream,
    out IList<string> warnings
)
{
    List<Phrase> phrases = new List<Phrase>();
    XDocument document = XDocument.Load(new StreamReader(stream));
    warnings = new List<string>();
    foreach (XElement phraseElement in document.Root.Elements())
    {
        Phrase phrase = new Phrase
        {
            Path = phraseElement.Attribute("path").Value,
            Key = phraseElement.Attribute("key").Value,
            ItemId = phraseElement.Attribute("itemid").Value,
            FieldId = phraseElement.Attribute("fieldid").Value,
            Updated = phraseElement.Attribute("updated").Value,
        };
        foreach (var languageElement in phraseElement.Elements())
        {
            var name = languageElement.Name.LocalName;
            if (phrase.Languages.ContainsKey(name))
                warnings.Add(
                    $"{phrase} in {name} exists in multiple   
                    versions. 
                    Choosing first .."
                );
            else
                phrase.Languages.Add(
                    languageElement.Name.LocalName,
                    languageElement.Value
                );
            }
            phrases.Add(phrase);
        }
    return phrases;
}

As you can see, it will generate a warning if you encounter duplicate language elements and choose the first one. But as described above, you would probably not use language files with duplicate language elements as importing such files might not update the field intended.


To save a list of phrases, use the following method:

Example:

public static string Write(IEnumerable<Phrase> phrases)
{
    XElement rootElement = new XElement("sitecore");
    foreach (var phrase in phrases)
    {
        XElement phraseElement = new XElement("phrase");
        phraseElement.Add(new XAttribute("path", phrase.Path));
        phraseElement.Add(new XAttribute("key", phrase.Key));
        phraseElement.Add(new XAttribute("itemid", phrase.ItemId));
        phraseElement.Add(new XAttribute("fieldid", 
        phrase.FieldId));
        phraseElement.Add(new XAttribute("updated", 
        phrase.Updated));
        foreach (var language in phrase.Languages)
            phraseElement.Add(new XElement(language.Key, 
            language.Value));
        
        rootElement.Add(phraseElement);
    }
     
    XDocument document = new XDocument(rootElement);
    return document.ToString();
}

 

Using the code above, we often choose to export and import using Sitecore’s native format and transform the files post-export to facilitate translation in e.g. Excel or dedicated translation software. This adds additional steps to the translation routine but keeps the specifics of the translation format (which may change) out of the solution.

Overriding the language file format

It is, however, possible to override the logic behind the language file format directly in Sitecore and export/import in an entirely different format directly from Sitecore. Unfortunately, this involves some rather heavy patching of Sitecore, which is another reason why we prefer to transform the files post-export.

The export and import functionalities are implemented in two nested classes (called Exporter and Importer) inside:

Example:

Sitecore.Shell.Applications.Globalization.ExportLanguage.ExportLanguageForm, Sitecore.Client

and

Example:

Sitecore.Shell.Applications.Globalization.ImportLanguage.ImportLanguageForm, Sitecore.Client

In reality, we only need to override the two nested classes, but this turns out to be rather tricky as the classes are non-virtual and statically bound to their parent-classes by the new keyword. So, to override the two nested classes we need to override the instantiation on the parent-class as well. A simple example of this could be:

Example:

namespace Globalization
{
    using System;
    using System.Linq;
    using Sitecore;
    using Sitecore.Data.Items;
    using Sitecore.Globalization;
    using Sitecore.IO;
    using Sitecore.Jobs;
    using Sitecore.Text;
    using Sitecore.Shell.Applications.Globalization.ExportLanguage;
    public class CustomExportLanguageForm : ExportLanguageForm
    {
        protected override Job StartExportLanguageJob(
            ListString languages,
            Item root
        )
        {
            var exporter = new CustomExporter(
                root,
                languages.Select(Language.Parse),
                FileUtil.MapPath(GetFilename(LanguageFile.Value))
            );
            var options = new JobOptions(
                "ExportLanguage", 
                "ExportLanguage", 
                Context.Site.Name, 
                exporter, 
                "Run"
            );
           options.ContextUser = Context.User;
           options.AfterLife = TimeSpan.FromMinutes(1.0);
           return JobManager.Start(options);
        }
    }
}

And the ImportLanguageForm:

Example:

namespace Globalization
{
    using Sitecore.Jobs;
    using Sitecore.Web.UI.HtmlControls;
    using Sitecore.Web.UI.Sheer;
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Web;
    using Sitecore;
    using Sitecore.Configuration;
    using Sitecore.Globalization;
    using Sitecore.Shell.Applications.Globalization.ImportLanguage;
    public class CustomImportLanguageForm : ImportLanguageForm
    {
        protected new void StartImport()
        {
            var importer = new CustomImporter(
                Factory.GetDatabase(Databases.SelectedItem.Value),
                GetLanguages(),
                LanguageFile.Value
            );
            var options = new JobOptions(
                "ImportLanguage",
                "ImportLanguage",
                Client.Site.Name,
                importer,
                "Run"
            );
            options.ContextUser = Context.User;
            options.AfterLife = TimeSpan.FromMinutes(1.0);
            options.WriteToLog = false;
            Registry.SetString("/Current_User/Import 
            Languages/File", LanguageFile.Value);
            Context.ClientPage.ServerProperties["handle"] =  
            JobManager.Start(
                options
            ).Handle.ToString();
            SheerResponse.Timer("CheckStatus", 500);
        }
        private IEnumerable<Language> GetLanguages()
        {
            return 
            HttpContext.Current.Request.Form.AllKeys.
            Where(
                key => 
                !string.IsNullOrEmpty(key) && 
                key.StartsWith("SelectedLanguage")).
            SelectMany(
                key => 
                HttpContext.Current.Request.Form.GetValues(key)?.
                Select(Language.Parse)
            ).
            ToList();
        }
    }
}

The point here is not to change the implementation on the language forms — but to replace the Exporter and Importer instantiation with a custom implementation. I have, however, somewhat changed the constructor of the Importer and Exporter to keep everything clean, but this is not absolutely necessary. In real world scenarios it would probably be a good idea to instantiate the exporter/importer via a factory method or by dependency injection, thus making the code above completely generic.

To use the pathed language forms, we need to change the code behind for the export and import controls defined in

Example:

sitecore\shell\Applications\Globalization\ExportLanguage\ExportLanguage.xml

and

Example:

sitecore\shell\Applications\Globalization\ImportLanguage\ImportLanguage.xml

We are now ready to create a new Exporter and Importer and thereby override Sitecore’s native language file format.

Implementing a custom exporter and importer

To get you going, I have created a custom exporter and importer, which can be used with the language forms above and show how Sitecore’s native language file format can be overridden to better suit the need(s) of your project(s).

In this exporter and importer, I am not radically changing the language file format (and much of the code below is actually taken from Sitecore’s implementation), but I have changed one of my own grievances with that native format and used the field id to identify fields:

Example:

namespace Globalization
{
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Xml;
    using Sitecore;
    using Sitecore.ContentSearch.Utilities;
    using Sitecore.Data;
    using Sitecore.Data.Fields;
    using Sitecore.Data.Items;
    using Sitecore.Diagnostics;
    using Sitecore.Globalization;
    using Sitecore.Jobs;
    public class CustomExporter
    {
        Item root;
        IEnumerable<Language> languages;
        string filename;
        public CustomExporter(
            Item root, 
            IEnumerable<Language> languages,
            string filename
        )
        {
            this.root = root;
            this.languages = languages;
            this.filename = filename;
        }
        public void Run()
        {
            Job job = Context.Job;
            try
            {
                using (XmlTextWriter writer =
                          new XmlTextWriter(
                              filename,
                              Encoding.UTF8)
                )
                {
                    writer.Formatting = Formatting.Indented;
                    writer.WriteStartElement("sitecore");
                    ExportItem(job, writer, root);
                    writer.WriteEndElement();
                    Log.Audit(this, 
                    "Export languages: {0}, to: {1}", 
                    StringUtil.Join(languages, ", "),
                    filename);
                }
            }
            catch (Exception ex)
            {
                job.Status.Failed = true;
                job.Status.Messages.Add(ex.ToString());
            }
            job.Status.State = JobState.Finished;
        }
    private void ExportItem(
        Job job,
        XmlTextWriter writer, 
        Item item
    )
    {
        item.GetLanguageFields(languages).
        Where(
            languageFields => 
            languageFields.Value.Any()
        ).
        ForEach(
            languageFields =>
            {
                writer.WriteStartElement("phrase");
                writer.WriteAttributeString("path", 
                item.Paths.Path);
                writer.WriteAttributeString("key", item.Name);
                writer.WriteAttributeString("itemid", 
                item.ID.ToString());
                writer.WriteAttributeString("fieldid", 
                languageFields.Key.ToString());
                writer.WriteAttributeString("fieldname",
                languageFields.Value[0].Name);
                writer.WriteAttributeString("updated", 
                DateUtil.ToIsoDate(item.Statistics.Updated));
                languageFields.Value.ForEach(
                    languageField => 
                    writer.WriteElementString(
                        languageField.Language.Name, 
                        languageField.Value)
                );
                writer.WriteEndElement();
            }
        );
        ++job.Status.Processed;
        item.Children.ForEach(
            child => ExportItem(job, writer, child)
        );
    }
}
 
public static class ItemExtensions
{
    public static Dictionary<ID, List<Field>> GetLanguageFields(
    this Item item, 
    IEnumerable<Language> languages)
    {
        var result = new Dictionary<ID, List<Field>>();
        languages.ForEach(
            language =>
            {
                item.Database.GetItem(item.ID,  
                language).Fields.Where(
                    field => 
                    field.ShouldBeTranslated).ForEach(
                        languageField =>
                        {
                            if     
                            (!result.ContainsKey(languageField.ID)) 
                                result.Add(languageField.ID,
                                new List<Field>());
                            result[languageField.ID].
                            Add(languageField);
                        }
                    );
                }
            );
            return result;
        }
    }
}

Notice that I have added a fieldname attribute as a reference to make the XML more humanly readable.

The Importer is pretty straightforward:

Example:

namespace Globalization
{
    using Sitecore.Data;
    using Sitecore.Data.Fields;
    using Sitecore.Data.Items;
    using Sitecore.Globalization;
    using Sitecore.Jobs;
    using Sitecore.Xml;
    using System.Collections.Generic;
    using System.Linq;
    using System.Xml;
    using Sitecore;
    public class CustomImporter
    {
        Database database;
        string filename;
        IEnumerable<Language> languages;
        public CustomImporter(
            Database database,
            IEnumerable<Language>  
            languages, 
            string filename
        )
        {
            this.database = database;
            this.languages = languages;
            this.filename = filename;
        }
        public void Run()
        {
            Job job = Context.Job;
            if (job == null)
                return;
            XmlNodeList phrases = 
                XmlUtil.LoadXmlFile(filename).
                SelectNodes(“/sitecore/phrase”);
            if (phrases == null)
                return;
            job.Status.Total = phrases.Count;
            foreach (XmlNode phrase in phrases)
            {
                ImportPhrase(phrase);
                ++job.Status.Processed;
            }
            job.Status.State = JobState.Finished;
        }
        public void ImportPhrase(XmlNode phrase)
        {
            var itemid = 
                ID.Parse(XmlUtil.GetAttribute("itemid", phrase));
            var fieldid = 
                ID.Parse(XmlUtil.GetAttribute("fieldid", phrase));
            Item item = database.GetItem(itemid);
            if (item == null)
                return;
            foreach (XmlNode childNode in phrase.ChildNodes)
            {
                var language = Language.Parse(childNode.LocalName);
                if (languages.Contains(language))
                {
                    Item languageItem = 
                        item.Database.GetItem(item.ID, language);
                    Field field = languageItem?.Fields[fieldid];
                    if (field != null && field.ShouldBeTranslated)
                    {
                        string value = XmlUtil.GetValue(childNode);
                        if (field.GetValue(false, false) != value)
                        {
                            languageItem.Editing.BeginEdit();
                            field.SetValue(value, true);
                            languageItem.Editing.EndEdit();
                        }
                    }
                }
            }
        }
    }
}

As you can see, once you have patched the language forms it is actually pretty simple to provide your own implementation of a language file format. Notice that I am using the build-in property ShouldBeTranslated to determine which fields to include in the language file. This excludes most of the system fields but may be adjusted to suit your specific needs.

Translating content is often a rather complex task and developing a good routine is often paramount in order to get the job done — and properly. I hope this post has clarified to you how Sitecore can support this routine. And while language files may not be a silver bullet this is certainly an option that we use a lot.

Please note that the code examples have been written using Sitecore 9.0.2.