3rd Translation Inference Across Dictionaries (TIAD) shared task

[LAST UPDATED: 17/12/2019]

Gold standard

To build the golden standard, we extracted translations from manually compiled pairs of K Dictionaries (KD), particularly its Global series. The coverage of KD is not the same as Apertium. To allow comparisons, we took the subset of KD that is covered by Apertium to build the gold standard, i.e., those KD translations for which the source and target terms are present in both Apertium RDF source and target lexicons. The gold standard remained hidden to participants. Graphically, for the FR-PT pair:

Evaluation process

For each system results file, and per language pair, we will
1.Remove duplicated translations (some systems might produce duplicated rows, i.e., identical source and target words, POS and confidence degree).
2. Filter out translations for which the source entry is not present in the golden standard (otherwise we cannot assess whether the translation is correct or not). Let’s call systemGS the subset of translations that pass this filter.
3. Translations with confidence degree under a given threshold will be removed from systemGS. In principle, the used threshold will be the one reported by participants as the optimal one during the training/preparation phase.
4. Compute the coverage of the system (i.e., how many entries in the source language were translated?) with respect to the gold standard. Graphically, for the source language:

5. Compute precision as P =(#correct translations in systemGS) / |systemGS|
6. Compute recall as R =(#correct translations in systemGS) / |GS|
where GS is the set of translations in the gold standard for a given language pair
6. Compute F-measure as F=2*P*R/(P+R)



Evaluation results