Dictionaries

The Dictionary feature allows a list of words to be added for each transcription job. This helps when a specific word is not recognised during transcription. It could be that it’s not in the vocabulary for that language, for example a company or person’s name. Adding words can improve the likelihood they will be output.

The soundsLike feature is an extension to this to allow alternative pronunciations to be specified to aid recognition when the pronunciation is not obvious.

Prior to using this feature, consider the following:

  • soundsLike is an optional setting recommended when the pronunciation is not obvious for the word, or it can be pronounced in multiple ways; it is valid just to provide the content value

  • soundsLike only works with the main script for that language

    • Japanese (ja) soundsLike only supports full width Hiragana or Katakana

You can specify up to 1000 words or phrases (per job) in your dictionary.

Property Reference

See Dictionary.

API

DictionaryEntries are part of the dictionary as JSON array property. By default, the DictionaryEntries of the Dictionary are not returned. You can include the DictionaryEntries when requesting a Dictionary by using the Extended view by either using the jsonView=Extended query parameter or the X-JSON-VIEW: Extended header.
  • list all dictionaries in a production: [GET] /api/production/123456789/dictionary

  • get a specific dictionary [GET] /api/production/123456789/dictionary/123456789

  • create a dictionary: [POST] /api/production/123456789/dictionary - with {label: My dictionary}

  • delete a dictionary: [DELETE] /api/production/123456789/dictionary/123456789

  • update a dictionary: [UPDATE] /api/production/123456789/dictionary/123456789

  • patch a dictionary: [PATCH] /api/production/123456789/dictionary/123456789

  • import a text based file as entries: [POST] /api/production/123456789/dictionary/123456789/entry/import

    • this will start an import workflow which will parse the given text file and add/replace the entries of the dictionary

    • As input a file resource id (which need to be created prior to starting this workflow) must be given

    • request parameters:

      • format:

        • type: csv | json

        • options: e.g.csv parser options

          • separator

          • strict

          • escape

          • quote

      • action: replace | add | merge

        • replace: removes all existing entries and adds the current given entries;

        • add: keeps the existing entries and adds the current given entries;

        • merge: keeps the existing entries and adds the current given entries if the entry (based on the content property) does not exist yet;

      • fileResourceId: this id of the file resource to be used as input for the import.

  • export a text based file containing the entries of the dictionary: [POST] /api/production/123456789/dictionary/123456789/entry/export

    • this will start an export workflow which will generate a text file containing all entries of the dictionary

    • the file can be downloaded as a published file of the workflow

    • parameters include:

      • format:

        • type: csv | json

        • options:

          • separator

The format of the import is currently limited to csv or json, with the following constraints:

The format for json is the following:

[
    {
        content: 'Limecraft',
        metadata: {
            custom: 1,
        },
        id: 'my personal id',
        soundsLike: ['laaimcraft'],
    },
    {
        content: 'Ugh',
        metadata: {
            custom: 4,
        },
        id: 'my personal id 4',
        soundsLike: [],
    },
    {
        content: 'Slurp',
        metadata: {
            custom: 5,
        },
        id: 'my personal id 5',
        soundsLike: [],
    },
    {
        content: 'Joink',
        metadata: {
            custom: 6,
        },
        id: 'my personal id 6',
        soundsLike: [],
    },
    {
        content: 'Smurf',
        metadata: {
            custom: 7,
        },
        id: 'my personal id 7',
        soundsLike: [],
    },
]

The csv import must have the following format:

content
Limecraft
Ugh
Slurp
Joink
Smurf

in which the first line denotes the header and the property the column should map to.

The following calls are not really needed, but can provide an alternate way of managing the DictionaryEntries on a Dictionary:

  • list dictionary entries: [GET] /api/production/123456789/dictionary/123456789/entry

  • create a dictionary entry: [POST] /api/production/123456789/dictionary/123456789/entry with {word: 'MyWord' }

Usage

Dictionaries can be used when starting a transcription workflow, see ./index.adoc#customDictionaries.