Friday, January 11, 2019

Translating Text in Sitecore using Microsoft Cognitive Services and PowerShell


Microsoft Cognitive Services is an impressive set of powerful, easy to integrate APIs for developers. With AI at its core, Cognitive Services strives to solve business problems using Vision, Speech, Knowledge, Search, and Language service offerings.  Recently announced, it's now easier than ever to start using Cognitive Services with a simplified key mechanism.

Creating a Cognitive Service in Azure itself is dead simple. It take less than 5 minutes to spin up a Cognitive Service resource and you're ready to develop. The above video will help get you started.☝

So, what can we do with these services in Sitecore?

Cognitive Services Is Not That New

We've seen folks across the Sitecore community utilising this technology since its inception in 2016 - specifically the Computer Vision API. From bulk image alt tagging using Computer Vision and PowerShell and automatic image alt tag generation module, to a fully integrated Cognitive Services module from the infamous Mark Stiles.

These are really great adaptations of these APIs, but I wanted to explore these services myself.

Translator Text API

The Translator Text API is easy to integrate in your applications, websites, tools, and solutions. It allows you to add multi-language user experiences in more than 60 languages, and can be used on any hardware platform with any operating system for text-to-text language translation.
The Translator Text API is part of the Azure Cognitive Services API collection of machine learning and AI algorithms in the cloud, and is readily consumable in your development projects.
What is Translator Text API?

Being completely obsessed with PowerShell, I figured I'd give it a go.  For my demo, I want to allow content authors to right-click on a Sitecore item, select a Translate Text Fields PowerShell script, configure some option in a dialog and click Continue.



The script should create a new language version in the target language selected and translate all text fields selected.

Pricing

Before starting, it should be noted that the Free tier for translations is 2 million characters per month. 
Since we're keeping this integration simple, it's not likely we'll hit the limit.



Let's Script It

Building the dialog is pretty straightforward. Get the context item, grab the applicable languages and fields (Single-Line and Multi-Line only for simplicity and API restriction avoidance), and set the dialog options:

We'll need a function that we can call to process the item after obtaining the selections from the dialog.  We expect to call it like this:

Translate-Item -TargetItem $item -FromLang $selectedFromLanguage -ToLang $selectedToLanguage -FieldsList $fieldSelectionArray

The function should issue an authentication token from the Cognitive Service, generate a new language version for the item in the target language, and call another function to translate the original field value to the target language:

The Issue-CognitiveApiToken in the above function issues the authentication token we need to send with our request to the translation API.

We can send a request to the translation API with the issued token in a new function that will be responsible for processing the actual translation:

Putting It All Together

Now that we've got commands to interact with the Translation API, we can build out the contextual PowerShell script to provide some options for our content author.

Adding this script to one of the Context Menu (PowerShell Script Library) folders (eg. /sitecore/system/Modules/PowerShell/Script Library/SPE/Core/Platform/Content Editor/Context Menu) will allow the script to be executed contextually.


Final Result


I can see this being a good use-case for Dictionary items specifically as there are some restrictions for content length size that the API will reject.  However, this is simply a demo of what is possible here (definitely not production-ready πŸ€ͺ).



Feel free to take and use any part of the script here and build something cool!

2 comments:

  1. Thanks for this!

    It got my brain thinking, and our team uses AWS. If you want, you can basically replace the Azure areas with following one line (with your own accessKey and secretKey).

    $fieldValue | ConvertTo-TRNTargetLanguage -SourceLanguageCode $fromLang.Substring(0,2) -TargetLanguageCode $toLang.Substring(0,2) -AccessKey $accessKey -SecretKey $secretKey

    ReplyDelete
  2. This post was super helpful! We wanted to use DeepL as our API service and it was really easy to swap that in place of the MS API. I'm not sure if you ran into this also, but DeepL encodes everything using UTF8, but doesn't include the charset in the ContentType response headers. Powershell's Invoke-WebRequest seems to default to ISO-8859-1. I had to add logic to convert the response back to bytes and decode it as UTF8 and then I was getting the extended ASCII responses.
    e.g.
    $jsonUTF8 = [Text.Encoding]::UTF8.GetString(
    [Text.Encoding]::GetEncoding(28591).GetBytes($result.Content)
    ) | ConvertFrom-Json

    ReplyDelete