Transcription
Introduction
This guide explains the overall process for executing speech transcription with Limecraft Flow.
The outline of the process is as follows:
-
Upload audiovisual media clips into the Limecraft Flow platform such that they are available for enrichment by processes such as transcription or subtitling;
-
Start the automated transcription workflow on one or more clips in the platform;
-
Follow-up the status of the transcription workflow until completion before requesting the results;
-
Retrieve the transcript now attached to the transcribed media clips;
-
Optionally customize the transcription process with the use of custom dictionaries and text alignment.
In addition to these, steps, we also describe the statuses a clip can have regarding transcription, to aid in automated workflows and to guide the Limecraft Flow UI to properly display externally modified transcripts. To close off this chapter, we also list the API call to update a transcript from a third party system.
Upload audiovisual media clips into the Limecraft Flow platform
Before transcription workflows can be run, audiovisual material needs to be uploaded to the Limecraft platform. Both media with audio and video or audio-only clips can be uploaded and processed for speech transcription.
The various ways of creating clips are described in its dedicated documentation section.
Start the automated transcription workflow
Once a clip has been ingested successfully, it can be used for further enrichment, including transcription.
Starting the speech transcription workflow is done using this call:
POST /production/{prId}/mo/{moId}/service/transcript
Start a transcription generation process and use a specific engine depending on the body of the query.
Details
Description
If nothing is stated, production-defined one will be used.
Parameters
Path Parameters
Name | Description | Required | Type |
---|---|---|---|
|
ID of the production. |
✔ |
Long |
|
ID of the media object. |
✔ |
Long |
Body Parameters
Name | Description | Required | Type |
---|---|---|---|
|
Transcript request object. |
✘ |
TranscriptRequest |
TranscriptRequest
Field Name | Required | Type | Description | Format |
---|---|---|---|---|
align |
✘ |
Boolean |
Run transcription in alignment mode, in which the alignInput will become the transcript. |
|
alignInput |
✘ |
String |
Text to use for transcription alignment. |
|
dictionaryId |
✘ |
Long |
Id of the dictionary to use during transcription. |
int64 |
language |
✘ |
String |
Language code to use for transcription. The code has to be supported by the speechEngine. |
|
numberOfSpeakers |
✘ |
Long |
How many speakers are expected. Usage depends on the speechEngine. |
int64 |
redo |
✘ |
Boolean |
Run again, even if the workflow already ran in this context. |
|
redoSingleTask |
✘ |
Boolean |
||
skipActiveWorkflowTest |
✘ |
Boolean |
||
speechEngine |
✘ |
String |
Which speech engine should be used for transcription. |
Enum: VOLDEMORT, KALDI, VOLDEMORT2, VOLDEMORT3, VOLDEMORT4, VOLDEMORT5, |
subtitle |
✘ |
Boolean |
After the transcript is generated, also create a Subtitle annotation from it. |
|
subtitlePresetId |
✘ |
String |
When subtitle is true, create subtitles using this subtitle preset. |
|
subtitlingConfiguration |
✘ |
subtitlingConfiguration |
||
transcriptConfiguration |
✘ |
transcriptConfiguration |
||
waitForWorkflow |
✘ |
Boolean |
Return Type
MediaObjectWorkflowReport
Field Name | Required | Type | Description | Format |
---|---|---|---|---|
adminOnly |
✘ |
Boolean |
||
audioAnalyzerCompleted |
✘ |
Date |
date-time |
|
created |
✘ |
Date |
The time when this resource was created |
date-time |
createdBy |
✘ |
String |
The request or process that created this resource |
|
createdByShareId |
✘ |
Long |
int64 |
|
createdBySharedUserId |
✘ |
Long |
int64 |
|
creatorId |
✘ |
Long |
The id of the user who created this resource |
int64 |
duration |
✘ |
Double |
double |
|
errorReports |
✘ |
List of TaskReport |
||
extra |
✘ |
Object |
||
funnel |
✘ |
String |
||
id |
✔ |
Long |
The id of this resource |
int64 |
label |
✘ |
String |
User-friendly label of the workflow |
|
lastUpdated |
✘ |
Date |
The time when this resource was last updated |
date-time |
mediaAnalyzerCompleted |
✘ |
Date |
date-time |
|
mediaObjectId |
✘ |
Long |
int64 |
|
modifiedBy |
✔ |
String |
The request or process responsible for the last update of this resource |
|
objectType |
✘ |
String |
The data model type or class name of this resource |
|
productionId |
✘ |
Long |
int64 |
|
publishedFiles |
✘ |
List of object |
Files generated by the workflow, which can be downloaded. |
|
removeFromQuota |
✘ |
Boolean |
||
requiredRights |
✘ |
List of ProductionPermission |
||
size |
✘ |
Long |
int64 |
|
startupParameters |
✘ |
Object |
||
status |
✘ |
String |
Enum: Inited, Started, Completed, Error, Cancelled, Paused, CompletedPending, ErrorPending, WaitForCallback, Scheduled, |
|
successFul |
✘ |
Boolean |
||
target |
✘ |
String |
||
taskReports |
✘ |
List of TaskReport |
||
transcoder1Completed |
✘ |
Date |
date-time |
|
transcoder2Completed |
✘ |
Date |
date-time |
|
variables |
✘ |
Object |
||
version |
✔ |
Long |
The version of this resource, used for Optimistic Locking |
int64 |
workflowCompleted |
✘ |
Date |
When did the workflow complete? |
date-time |
workflowFailed |
✘ |
Date |
When did the workflow fail? |
date-time |
workflowId |
✘ |
String |
The id of the workflow. This can be used to retrieve the workflow status. |
|
workflowStarted |
✘ |
Date |
When was the workflow started? |
date-time |
workflowTask |
✘ |
String |
||
workflowType |
✘ |
String |
Enum: INGEST, SPEECH, IPPAMEXPORT, IPPAMSYNC, MOIEXPORT, REMOTE_SPEECH, VOLDEMORT_SPEECH, TRANSCODE, AUDIOANALYZE, EXPORT_VWFLOW, FEATURE_EXTRACTION, BLACK_FRAME, STON_APPROVE, AAF_EXPORT, FCP_EXPORT, VOLDEMORT_SPEECH_2, KALDI_SPEECH, SUBTITLING, MIGRATE, INDEX, BACKUP, VOLDEMORT_SPEECH_3, VOLDEMORT_SPEECH_4, VOLDEMORT_SPEECH_5, TRANSLATION, INDEX_SWITCH, SIMPLEINGEST, CLONE, UPDATE_CATEGORY, WEBHOOK, SETKEEPER_ATTACH, PDF_EXPORT, SHOT_DETECTION, EXPORT, REMOTE_HELLO_WORLD, CUSTOM, UNKNOWN, CHANGE_AUDIO_LAYOUT, WORKSPACE_BOOTSTRAP, MEDIA_TRANSFER_COMPLETE, MEDIA_TRANSFER_FAILED, DELIVERY_REQUEST_SUBMISSION_CLIP_PROBED, DELIVERY_REQUEST_SUBMISSION, ADVANCED_SUBTITLE, TRANSCRIPTION_SUMMARIZE, |
Content Type
-
application/json
Responses
Code | Description | Datatype |
---|---|---|
200 |
The request was successful. |
|
400 |
The language is not supported. |
|
403 |
The user needs START_TRANSCRIPTION_WORKFLOW rights. |
|
404 |
The production or media object was not found. |
|
409 |
The workflow was already in progress. |
|
The body of this request is a TranscriptRequest JSON object. Its language
parameter should be the language code for the language the clip audio is in.
For example:
{
"language": "en",
"redo": true
}
The redo
parameter can be used as a safeguard to prevent duplicate transcription workflows.
If redo is false, (which is the default),
starting the workflow will result in a failure if there was a transcription workflow started before.
To learn which language
codes are supported, refer to Speech Engines and supported features.
Another way to create a transcript automatically is through Translation, which is discussed on its own page. |
Follow-up the status of the transcription workflow
The execution of the automated transcription process is modeled as any other Limecraft Flow platform workflow, like the other enrichment and media processing workflows in our system. As such, the workflow API can be used to track its progress, till completion or failure.
The call mentioned above will return a MediaObjectWorkflowReport. Its workflowId
field gives you a reference to the workflow that was started. Once this workflow completes, the TranscriptAnnotations will have been created. To learn how to wait for a workflow to complete, see this section.
Retrieve the transcript
Speech transcription workflows which complete succesfully deliver speech transcription results and attach those to each respective clip as multiple TranscriptAnnotation objects.
Retrieving TranscriptAnnotations is done using the query call to List all the annotations of a MediaObject with the appropriate parameters:
GET /production/{prId}/mo/{moId}/an/query?offset=0&rows=1000&sort=start ASC&fq=language:"en"&fq=funnel:TranscriptAnnotation
Query Parameter | Description |
---|---|
|
We only want to retrieve TranscriptAnnotations, so we add a filter query on funnel. |
|
Another filter query is set on the language fields, to only retrieve the TranscriptAnnotations in this particular language. |
|
Keep in mind that the annotation endpoint uses paging to deliver the full set of data from the API. As such, use proper paging parameters to ensure that a complete set of transcript elements is returned (in one or more API calls).
|
|
Sorting the results with increasing start times will return the transcription annotations chronologically. |
The result of this call will be a sorted list of TranscriptAnnotations.
Customize the transcription process
Use a different speech transcription engine
The Limecraft Flow platform supports multiple transcription engines. The default transcription engine is Speechmatics, unless your production workspace is configured otherwise.
Users with access to an enterprise plan can also use one of the other speech transcription engines we support: Google Speech, Vocapia and Kaldi. The speechEngine
parameter is used to choose the engine:
speechEngine | Description |
---|---|
voldemort2 |
Vocapia |
voldemort3 |
Speechmatics. This is usually the default engine, unless your production workspace is configured differently. |
voldemort4 |
Google Speech |
For example, the following example starts a transcription with Vocapia:
{
"speechEngine": "voldemort2",
"language": "en",
"redo": true,
"redoSingleTask": true
}
It is important to note that not all speech engines support the same languages and feature set! Refer to Speech Engines and supported features to learn more.
Custom Dictionaries
Apart from specifying the language and speech engine to be used for transcription, our platform also supports the use of custom dictionaries to help return more accurate speech transcription results.
Documentation on how to create and maintain dictionary is available in the relevant document.
Using the custom dictionary when running the transcription process can be done by specifying the dictionaryId
parameter as part of the request JSON body to the transcription call, as follows:
{
"force": true,
"dictionaryId": 27,
"language": "fr",
"align": false,
"subtitle": false
}
Custom dictionaries can currently only be used with the Speechmatics ASR backend. Refer to Speech Engines and supported features to learn more. |
Alignment of existing transcripts
Our platform also provides functionality for ‘alignment’ of pre-existing transcripts. In this case, non-timed input text is given per-word timings and speaker assignments, and are returned in the same transcription format as from regular audio transcription calls.
Alignment can be initiated by sending the following JSON body to the transcription call, with align: true
and the input text place in the alignInput
field:
{
"force": true,
"language": "en",
"align": true,
"alignInput": "Look at this clock. When the bell rings, we can see it as well as hear it."
}
The text in alignInput
should conform to certain requirements for optimal results:
-
UTF-8 encoded plain text (no markup, no timecodes, …)
-
Text should be in the same language as the audio of the clip
-
Only spoken text (no time codes, no speakers)
-
One sentence on each line (with punctuation marks).
If you have speaker info available, you can put it in the alignInput
like this:
SPEAKER: ILSA
But what about us?
SPEAKER: RICK
We'll always have Paris. We didn't have, we, we lost it until you came to Casablanca. We got it back last night.
SPEAKER: ILSA
When I said I would never leave you.
Transcript alignment is currently available with the Speechmatics and Vocapia ASR backends. Refer to Speech Engines and supported features to learn more. |
Transcription status of a clip
The MediaObjectAnnotation of the clip has a field transcriptionStatuses
which contains the transcription status for each language for that clip.
Note that transcriptionStatuses
is used to populate the language selector in the transcriber application of Flow-UI. If the status isn’t set, it won’t be shown in Flow-UI!
Example:
{
"transcriptionStatuses": {
"en": "AUTOMATIC_COMPLETED",
"fr": "EDITING"
}
}
The keys of the transcriptionStatuses
are the language codes.
The values of the transcriptionStatuses
map are any of the following:
status | Description |
---|---|
NOT_STARTED |
The transcription for this version has not started. Same as if the key wouldn’t exist. |
AUTOMATIC_STARTED |
The automatic transcription process is busy |
AUTOMATIC_FAILED |
The automatic transcription was started but has failed |
AUTOMATIC_COMPLETED |
The automatic transcription has completed succesfully |
EDITING |
Editing has started. This could come after AUTOMATIC_COMPLETED. |
COMPLETED |
The transcript editing for this version has completed. Editing won’t be possible in Flow-UI unless the status is changed to EDITING. |
Edit the transcript manually
POST /production/{prId}/mo/{moId}/an
This call is used to create an annotation and tie it to the specific media object.
Details
Description
Parameters
Path Parameters
Name | Description | Required | Type |
---|---|---|---|
|
ID of the production. |
✔ |
Long |
|
ID of the media object. |
✔ |
Long |
Body Parameters
Name | Description | Required | Type |
---|---|---|---|
|
Annotation object. |
✘ |
Object |
body
Field Name |
Required |
Type |
Description |
Format |
Return Type
Annotation
Field Name | Required | Type | Description | Format |
---|---|---|---|---|
annotationProductionId |
✘ |
Long |
int64 |
|
clipMetadata |
✘ |
ClipMetadata |
||
created |
✘ |
Date |
The time when this resource was created |
date-time |
createdBy |
✘ |
String |
The request or process that created this resource |
|
createdByShareId |
✘ |
Long |
int64 |
|
createdBySharedUserId |
✘ |
Long |
int64 |
|
creatorId |
✘ |
Long |
The id of the user who created this resource |
int64 |
crossProduction |
✘ |
Boolean |
||
customFields |
✘ |
CustomFields |
||
deleted |
✘ |
Date |
date-time |
|
description |
✘ |
String |
Textual contents of the Annotation |
|
end |
✘ |
Long |
The frame range described by the annotation runs up to end, but not including it. Should be less than or equal to the amount of frames the MediaObject has. |
int64 |
funnel |
✘ |
String |
Describes how the Annotation should be interpreted by the client application. Can be thought of as a subtype. |
|
id |
✔ |
Long |
The id of this resource |
int64 |
includeTranslatedTo |
✘ |
Boolean |
||
includesFrom |
✘ |
Set of string |
||
keyframeFrames |
✘ |
Long |
int64 |
|
label |
✘ |
String |
||
language |
✘ |
String |
||
lastUpdated |
✘ |
Date |
The time when this resource was last updated |
date-time |
mediaObject |
✘ |
MediaObject |
||
mediaObjectId |
✘ |
Long |
int64 |
|
modifiedBy |
✔ |
String |
The request or process responsible for the last update of this resource |
|
objectType |
✘ |
String |
The data model type or class name of this resource |
|
origin |
✘ |
String |
||
productionId |
✘ |
Long |
int64 |
|
rating |
✘ |
Double |
double |
|
relatedToId |
✘ |
Long |
int64 |
|
securityClasses |
✘ |
Set of string |
Enum: |
|
source |
✘ |
String |
||
spatial |
✘ |
String |
Link the Annotation to a specific part of the video or image frame. A Media Fragments Spatial Dimension description string is expected. |
|
start |
✘ |
Long |
First frame of the Annotation. 0 is the first frame of the clip. The start frame is included in the frame range the annotation describes. |
int64 |
systemFields |
✘ |
CustomFields |
||
tags |
✘ |
Set of string |
||
translatedFromId |
✘ |
Long |
int64 |
|
translatedToIds |
✘ |
Set of long |
int64 |
|
version |
✔ |
Long |
The version of this resource, used for Optimistic Locking |
int64 |
Content Type
-
application/json
Responses
Code | Description | Datatype |
---|---|---|
201 |
The request was successful. |
|
403 |
The user needs LIBRARY_UPDATE_METADATA, LOG_EDIT, SUBTITLE_EDIT, or TRANSCRIBER_EDIT rights depending on the type of the annotation. |
|
404 |
The production or MediaObject was not found. |
|
422 |
The annotation does not validate to the annotation restriction. For example annotation.start > annotation.end. |
|
Body example
{
"start": 0,
"end": 6,
"funnel": "TranscriptAnnotation",
"language": "en",
"source": "PostMan",
"label": "PostMan Generated",
"type": "TRANSCRIBER",
"speaker": "F1",
"objectType": "TranscriptAnnotation",
"structuredDescription": {
"confidence": 0.9935,
"parts": [
{
"start": 0,
"duration": 2,
"word": "Hi, ",
"confidence": 1,
"speaker": "F1",
"type": "LEX"
},
{
"start": 2,
"duration": 3,
"word": "I'm ",
"confidence": 1,
"speaker": "F1",
"type": "LEX"
},
{
"start": 5,
"duration": 3,
"word": "Amy",
"confidence": 0.95,
"speaker": "F1",
"type": "LEX"
}
],
"language": "en",
"gender": "F"
}
}