TranscriptAnnotation

A TranscriptAnnotation is used by the transcription engine and the Limecraft Flow UI Transcriber application. It represents a single paragraph of spoken text, spoken by a single speaker, where each single word is timecoded to the exact frames where it is spoken out loud.

objectType, funnel, type

The TranscriptAnnotation is an extension of StructuredAnnotation. So all properties of a StructuredAnnotation are also available on the TranscriptAnnotation.

The objectType will be "TranscriptAnnotation", the type of the StructuredAnnotation will be "TRANSCRIBER", and the funnel will be "TranscriptAnnotation".

Example

The example below illustrates a TranscriptAnnotation representing two sentences of text.

{
    "id": 727548,
    "objectType": "TranscriptAnnotation",
    "funnel": "TranscriptAnnotation",
    "start": 2566,
    "end": 2737,
    "speaker": "S1",
    "includesFrom": [],
    "label": "Automatic Transcription Fri Oct 29 2021 08:00:52 GMT+0000 (Coordinated Universal Time)",
    "language": "en",
    "mediaObjectId": 470672,
    "source": "automatic_transcription",
    "type": "TRANSCRIBER",
    "version": 1,
    "structuredDescription": {
        "confidence": 0.9612068965517244,
        "language": "en",
        "parts": [
            {
                "start": 2566,
                "duration": 8,
                "word": "This ",
                "confidence": 0,
                "speaker": "S1",
                "type": "LEX",
                "id": "727548-2566",
                "isStartInterpolated": false,
                "isEndInterpolated": false
            },
            {
                "start": 2574,
                "duration": 17,
                "word": "blade ",
                "confidence": 0,
                "speaker": "S1",
                "type": "LEX",
                "id": "727548-2574",
                "isStartInterpolated": false,
                "isEndInterpolated": false
            },
            {
                "start": 2591,
                "duration": 6,
                "word": "has ",
                "confidence": 0,
                "speaker": "S1",
                "type": "LEX",
                "id": "727548-2591",
                "isStartInterpolated": false,
                "isEndInterpolated": false
            },
            {
                "start": 2597,
                "duration": 2,
                "word": "a ",
                "confidence": 0,
                "speaker": "S1",
                "type": "LEX",
                "id": "727548-2597",
                "isStartInterpolated": false,
                "isEndInterpolated": false
            },
            {
                "start": 2599,
                "duration": 9,
                "word": "dark ",
                "confidence": 0,
                "speaker": "S1",
                "type": "LEX",
                "id": "727548-2599",
                "isStartInterpolated": false,
                "isEndInterpolated": false
            },
            {
                "start": 2608,
                "duration": 14,
                "word": "past. ",
                "confidence": 0,
                "speaker": "S1",
                "type": "LEX",
                "id": "727548-2608",
                "isStartInterpolated": false,
                "isEndInterpolated": false
            },
            {
                "start": 2675,
                "duration": 5,
                "word": "It ",
                "confidence": 0,
                "speaker": "S1",
                "type": "LEX",
                "id": "727548-2675",
                "isStartInterpolated": false,
                "isEndInterpolated": false
            },
            {
                "start": 2680,
                "duration": 3,
                "word": "has ",
                "confidence": 0,
                "speaker": "S1",
                "type": "LEX",
                "id": "727548-2680",
                "isStartInterpolated": false,
                "isEndInterpolated": false
            },
            {
                "start": 2683,
                "duration": 14,
                "word": "shed ",
                "confidence": 0,
                "speaker": "S1",
                "type": "LEX",
                "id": "727548-2683",
                "isStartInterpolated": false,
                "isEndInterpolated": false
            },
            {
                "start": 2697,
                "duration": 12,
                "word": "much ",
                "confidence": 0,
                "speaker": "S1",
                "type": "LEX",
                "id": "727548-2697",
                "isStartInterpolated": false,
                "isEndInterpolated": false
            },
            {
                "start": 2714,
                "duration": 11,
                "word": "innocent ",
                "confidence": 0,
                "speaker": "S1",
                "type": "LEX",
                "id": "727548-2714",
                "isStartInterpolated": false,
                "isEndInterpolated": false
            },
            {
                "start": 2728,
                "duration": 9,
                "word": "blood. ",
                "confidence": 0,
                "speaker": "S1",
                "type": "LEX",
                "id": "727548-2728",
                "isStartInterpolated": false,
                "isEndInterpolated": false
            }
        ]
    }
}

Timecoded words in parts

The structuredDescription.parts array is the core of a TranscriptAnnotation. It contains an object for each word of the text:

{
    "start": 2566,
    "duration": 8,
    "word": "This ",
    "speaker": "S1",
    "confidence": 0,
    "type": "LEX",
    "id": "727548-2566",
    "isInterpolation": false,
    "isStartInterpolated": false,
    "isEndInterpolated": false
}
Field Description

start

Start frame of the part. Should be ≥ the start of the TranscriptAnnotation.

duration

Duration, in frames, of the part. So, start+duration gives the 'end' of the part. The duration and start should be chosen so that duration+start is ≤ the end of the TranscriptAnnotation.

word

A string containing the word being spoken. Spaces should be included in the word string, they are not added automatically between words.

Note that it can also contain multiple words in a single string if there is no need to timecode individual words. So, "This is some text " would also be a valid value. On the other hand, it could also contain only part of a word, if there is a need to timecode individual parts of a word.

confidence

A value between 0 and 1 indicating how confident we are about the text in word being the actual text being spoken out loud. This might be filled in by the automatic transcription.

speaker

A string indicating the name of the speaker.

type

What type of word this is. This could be filled in by the automatic transcription.

id

An optional client side identifier for the part.

isInterpolation

Deprecated. Use isStartInterpolated / isEndInterpolated instead.

isStartInterpolated

true if the start boundary is not known for sure, but interpolated between other boundaries. false if the start value is set by either the automatic transcription engine, or by the user, or assumed to be correct for another reason.

isEndInterpolated

Like isStartInterpolated, but for the end of the part (so, for start+duration).

The parts in the array are not necessarily sorted chronologically! It is the client’s responsibility to sort them by start.

The start and duration of the parts should be chosen so that the parts fall are fully within the start and end range of the TranscriptAnnotation. There can also be no overlap between parts. There can be gaps though (indicating silence).

Part Timing Interpolation: isStartInterpolated, isEndInterpolated

In the Limecraft Flow UI Transcriber application, it is possible to set the text between a start and end timecode.

Interpolation in the Transcriber application

This will create a part for each word, but it will have to interpolate some of the timecodes. The start of the first part ("Demo") is known, and the intended end (so, start+duration) of the last part is also known. All other 'boundaries' (start and end of parts) have to be interpolated in between. This is usually done by assuming the same length for each word. When a boundary is interpolated like this, it will set isStartInterpolated or isEndInterpolated to true. This information is important, so we know the boundaries are not known for sure.

[
    {
        "start": 3675,
        "duration": 16,
        "type": "LEX",
        "word": "Demo",
        "isStartInterpolated": false,
        "isEndInterpolated": true
    },
    {
        "start": 3691,
        "duration": 16,
        "type": "LEX",
        "word": " of",
        "isStartInterpolated": true,
        "isEndInterpolated": true
    },
    {
        "start": 3707,
        "duration": 17,
        "type": "LEX",
        "word": " interpolation.",
        "isStartInterpolated": true,
        "isEndInterpolated": false
    },
    {
        "start": 4952,
        "duration": 8,
        "word": "Hey, ",
        "confidence": 0.98,
        "speaker": "S2",
        "type": "LEX",
        "id": "727549-4952",
        "isStartInterpolated": false,
        "isEndInterpolated": false
    },
    {
        "start": 4966,
        "duration": 9,
        "word": "she's ",
        "confidence": 0.53,
        "speaker": "S2",
        "type": "LEX",
        "id": "727549-4966",
        "isStartInterpolated": false,
        "isEndInterpolated": false
    }
]

If edit operations are done afterward, interpolation will always happen between 'trusted' boundaries which have is[Start|End]Interpolated set to false.

Language

All transcript annotations having the same value for language of a MediaObject are assumed to represent a single transcript. In the Limecraft Flow UI, these will be shown as part of the same transcript.

Multiple transcripts can exist, as long as they have distinct values for the language field.

language is a property of the Annotation. It might also appear on the structuredDescription, which is set by the automatic transcription engine. When talking about the language of an annotation, we are always talking about the property on the Annotation (not the one in structuredDescription).

See Annotation for information on the possible values for the language.

Speaker

The speaker string indicates who is talking. It should uniquely identify one of the participants in the transcribed conversations.

Property Reference

Field Name Required Type Description Format

annotationProductionId

Long

int64

clipMetadata

ClipMetadata

created

Date

The time when this resource was created

date-time

createdBy

String

The request or process that created this resource

createdByShareId

Long

int64

createdBySharedUserId

Long

int64

creatorId

Long

The id of the user who created this resource

int64

crossProduction

Boolean

customFields

CustomFields

deleted

Date

date-time

description

String

Textual contents of the Annotation

end

Long

The frame range described by the annotation runs up to end, but not including it. Should be less than or equal to the amount of frames the MediaObject has.

int64

funnel

String

Describes how the Annotation should be interpreted by the client application. Can be thought of as a subtype.

genericType

String

id

Long

The id of this resource

int64

includeTranslatedTo

Boolean

includesFrom

Set of string

keyframeFrames

Long

int64

label

String

language

String

lastUpdated

Date

The time when this resource was last updated

date-time

mediaObject

MediaObject

mediaObjectId

Long

int64

modifiedBy

String

The request or process responsible for the last update of this resource

objectType

String

The data model type or class name of this resource

origin

String

productionId

Long

int64

rating

Double

double

relatedToId

Long

int64

securityClasses

Set of string

Enum:

source

String

spatial

String

Link the Annotation to a specific part of the video or image frame. A Media Fragments Spatial Dimension description string is expected.

speaker

String

start

Long

First frame of the Annotation. 0 is the first frame of the clip. The start frame is included in the frame range the annotation describes.

int64

structuredDescription

Object

systemFields

CustomFields

tags

Set of string

translatedFromId

Long

int64

translatedToIds

Set of long

int64

type

String

Enum: MINIME, EBU_CORE, TRANSCRIBER, TECHNICAL_DETAILS, GENERIC, SUBTITLE, SHOT,

version

Long

The version of this resource, used for Optimistic Locking

int64