TranscriptAnnotation
A TranscriptAnnotation is used by the transcription engine and the Limecraft Flow UI Transcriber application. It represents a single paragraph of spoken text, spoken by a single speaker, where each single word is timecoded to the exact frames where it is spoken out loud.
objectType, funnel, type
The TranscriptAnnotation is an extension of StructuredAnnotation. So all properties of a StructuredAnnotation are also available on the TranscriptAnnotation.
The objectType
will be "TranscriptAnnotation"
, the type
of the StructuredAnnotation
will be "TRANSCRIBER"
, and the funnel will be "TranscriptAnnotation"
.
Example
The example below illustrates a TranscriptAnnotation
representing two sentences of text.
{
"id": 727548,
"objectType": "TranscriptAnnotation",
"funnel": "TranscriptAnnotation",
"start": 2566,
"end": 2737,
"speaker": "S1",
"includesFrom": [],
"label": "Automatic Transcription Fri Oct 29 2021 08:00:52 GMT+0000 (Coordinated Universal Time)",
"language": "en",
"mediaObjectId": 470672,
"source": "automatic_transcription",
"type": "TRANSCRIBER",
"version": 1,
"structuredDescription": {
"confidence": 0.9612068965517244,
"language": "en",
"parts": [
{
"start": 2566,
"duration": 8,
"word": "This ",
"confidence": 0,
"speaker": "S1",
"type": "LEX",
"id": "727548-2566",
"isStartInterpolated": false,
"isEndInterpolated": false
},
{
"start": 2574,
"duration": 17,
"word": "blade ",
"confidence": 0,
"speaker": "S1",
"type": "LEX",
"id": "727548-2574",
"isStartInterpolated": false,
"isEndInterpolated": false
},
{
"start": 2591,
"duration": 6,
"word": "has ",
"confidence": 0,
"speaker": "S1",
"type": "LEX",
"id": "727548-2591",
"isStartInterpolated": false,
"isEndInterpolated": false
},
{
"start": 2597,
"duration": 2,
"word": "a ",
"confidence": 0,
"speaker": "S1",
"type": "LEX",
"id": "727548-2597",
"isStartInterpolated": false,
"isEndInterpolated": false
},
{
"start": 2599,
"duration": 9,
"word": "dark ",
"confidence": 0,
"speaker": "S1",
"type": "LEX",
"id": "727548-2599",
"isStartInterpolated": false,
"isEndInterpolated": false
},
{
"start": 2608,
"duration": 14,
"word": "past. ",
"confidence": 0,
"speaker": "S1",
"type": "LEX",
"id": "727548-2608",
"isStartInterpolated": false,
"isEndInterpolated": false
},
{
"start": 2675,
"duration": 5,
"word": "It ",
"confidence": 0,
"speaker": "S1",
"type": "LEX",
"id": "727548-2675",
"isStartInterpolated": false,
"isEndInterpolated": false
},
{
"start": 2680,
"duration": 3,
"word": "has ",
"confidence": 0,
"speaker": "S1",
"type": "LEX",
"id": "727548-2680",
"isStartInterpolated": false,
"isEndInterpolated": false
},
{
"start": 2683,
"duration": 14,
"word": "shed ",
"confidence": 0,
"speaker": "S1",
"type": "LEX",
"id": "727548-2683",
"isStartInterpolated": false,
"isEndInterpolated": false
},
{
"start": 2697,
"duration": 12,
"word": "much ",
"confidence": 0,
"speaker": "S1",
"type": "LEX",
"id": "727548-2697",
"isStartInterpolated": false,
"isEndInterpolated": false
},
{
"start": 2714,
"duration": 11,
"word": "innocent ",
"confidence": 0,
"speaker": "S1",
"type": "LEX",
"id": "727548-2714",
"isStartInterpolated": false,
"isEndInterpolated": false
},
{
"start": 2728,
"duration": 9,
"word": "blood. ",
"confidence": 0,
"speaker": "S1",
"type": "LEX",
"id": "727548-2728",
"isStartInterpolated": false,
"isEndInterpolated": false
}
]
}
}
Timecoded words in parts
The structuredDescription.parts
array is the core of a TranscriptAnnotation
. It contains an object for each word of the text:
{
"start": 2566,
"duration": 8,
"word": "This ",
"speaker": "S1",
"confidence": 0,
"type": "LEX",
"id": "727548-2566",
"isInterpolation": false,
"isStartInterpolated": false,
"isEndInterpolated": false
}
Field | Description |
---|---|
start |
Start frame of the part. Should be ≥ the |
duration |
Duration, in frames, of the part. So, |
word |
A string containing the word being spoken. Spaces should be included in the Note that it can also contain multiple words in a single string if there is no need to timecode individual words. So, |
confidence |
A value between 0 and 1 indicating how confident we are about the text in |
speaker |
A string indicating the name of the speaker. |
type |
What type of word this is. This could be filled in by the automatic transcription. |
id |
An optional client side identifier for the part. |
isInterpolation |
Deprecated. Use |
isStartInterpolated |
|
isEndInterpolated |
Like |
The parts in the array are not necessarily sorted chronologically! It is the client’s responsibility to sort them by start .
|
The start
and duration
of the parts should be chosen so that the parts fall are fully within the start
and end
range of the TranscriptAnnotation
. There can also be no overlap between parts. There can be gaps though (indicating silence).
Part Timing Interpolation: isStartInterpolated, isEndInterpolated
In the Limecraft Flow UI Transcriber application, it is possible to set the text between a start and end timecode.
This will create a part
for each word, but it will have to interpolate some of the timecodes. The start
of the first part ("Demo"
) is known, and the intended end (so, start+duration
) of the last part is also known. All other 'boundaries' (start and end of parts) have to be interpolated in between. This is usually done by assuming the same length for each word. When a boundary is interpolated like this, it will set isStartInterpolated
or isEndInterpolated
to true
. This information is important, so we know the boundaries are not known for sure.
[
{
"start": 3675,
"duration": 16,
"type": "LEX",
"word": "Demo",
"isStartInterpolated": false,
"isEndInterpolated": true
},
{
"start": 3691,
"duration": 16,
"type": "LEX",
"word": " of",
"isStartInterpolated": true,
"isEndInterpolated": true
},
{
"start": 3707,
"duration": 17,
"type": "LEX",
"word": " interpolation.",
"isStartInterpolated": true,
"isEndInterpolated": false
},
{
"start": 4952,
"duration": 8,
"word": "Hey, ",
"confidence": 0.98,
"speaker": "S2",
"type": "LEX",
"id": "727549-4952",
"isStartInterpolated": false,
"isEndInterpolated": false
},
{
"start": 4966,
"duration": 9,
"word": "she's ",
"confidence": 0.53,
"speaker": "S2",
"type": "LEX",
"id": "727549-4966",
"isStartInterpolated": false,
"isEndInterpolated": false
}
]
If edit operations are done afterward, interpolation will always happen between 'trusted' boundaries which have is[Start|End]Interpolated
set to false
.
Language
All transcript annotations having the same value for language
of a MediaObject are assumed to represent a single transcript. In the Limecraft Flow UI, these will be shown as part of the same transcript.
Multiple transcripts can exist, as long as they have distinct values for the language
field.
language is a property of the Annotation. It might also appear on the structuredDescription , which is set by the automatic transcription engine. When talking about the language of an annotation, we are always talking about the property on the Annotation (not the one in structuredDescription ).
|
See Annotation for information on the possible values for the language.
Speaker
The speaker
string indicates who is talking. It should uniquely identify one of the participants in the transcribed conversations.
Property Reference
Field Name | Required | Type | Description | Format |
---|---|---|---|---|
annotationProductionId |
✘ |
Long |
int64 |
|
clipMetadata |
✘ |
ClipMetadata |
||
created |
✘ |
Date |
The time when this resource was created |
date-time |
createdBy |
✘ |
String |
The request or process that created this resource |
|
createdByShareId |
✘ |
Long |
int64 |
|
createdBySharedUserId |
✘ |
Long |
int64 |
|
creatorId |
✘ |
Long |
The id of the user who created this resource |
int64 |
crossProduction |
✘ |
Boolean |
||
customFields |
✘ |
CustomFields |
||
deleted |
✘ |
Date |
date-time |
|
description |
✘ |
String |
Textual contents of the Annotation |
|
end |
✘ |
Long |
The frame range described by the annotation runs up to end, but not including it. Should be less than or equal to the amount of frames the MediaObject has. |
int64 |
funnel |
✘ |
String |
Describes how the Annotation should be interpreted by the client application. Can be thought of as a subtype. |
|
genericType |
✘ |
String |
||
id |
✔ |
Long |
The id of this resource |
int64 |
includeTranslatedTo |
✘ |
Boolean |
||
includesFrom |
✘ |
Set of string |
||
keyframeFrames |
✘ |
Long |
int64 |
|
label |
✘ |
String |
||
language |
✘ |
String |
||
lastUpdated |
✘ |
Date |
The time when this resource was last updated |
date-time |
mediaObject |
✘ |
MediaObject |
||
mediaObjectId |
✘ |
Long |
int64 |
|
modifiedBy |
✔ |
String |
The request or process responsible for the last update of this resource |
|
objectType |
✘ |
String |
The data model type or class name of this resource |
|
origin |
✘ |
String |
||
productionId |
✘ |
Long |
int64 |
|
rating |
✘ |
Double |
double |
|
relatedToId |
✘ |
Long |
int64 |
|
securityClasses |
✘ |
Set of string |
Enum: |
|
source |
✘ |
String |
||
spatial |
✘ |
String |
Link the Annotation to a specific part of the video or image frame. A Media Fragments Spatial Dimension description string is expected. |
|
speaker |
✘ |
String |
||
start |
✘ |
Long |
First frame of the Annotation. 0 is the first frame of the clip. The start frame is included in the frame range the annotation describes. |
int64 |
structuredDescription |
✘ |
Object |
||
systemFields |
✘ |
CustomFields |
||
tags |
✘ |
Set of string |
||
translatedFromId |
✘ |
Long |
int64 |
|
translatedToIds |
✘ |
Set of long |
int64 |
|
type |
✘ |
String |
Enum: MINIME, EBU_CORE, TRANSCRIBER, TECHNICAL_DETAILS, GENERIC, SUBTITLE, SHOT, |
|
version |
✔ |
Long |
The version of this resource, used for Optimistic Locking |
int64 |