LUIZ Slovene Term Extraction Demo

Enter a filename, or click on the browse button to choose one:


This demo takes a lemmatised and part-of-speech tagged Slovene corpus as input and will extract multi-word terminological units. Please use the ToTaLe analyser to pre-process your corpus.

Alternatively, if you upload a txt, odt, pdf or doc file (UTF-8 is assumed), the file will be converted to text and sent to ToTaLe analyser for lemmatisation and part-of-speech tagging. Please check ToTaLe terms of use before you use this function.

Results will be displayed on a separate page in a two-columned table, the first column containing the lemmatised version of a multi-word term (eg. "bojna enota kopenskih sil" will become "bojen enota kopenski sila") and the second column containing the canonical term (in the nominative case), if this form was present in the uploaded corpus.

This demo was developed by Špela Vintar and Jan Jona Javoršek.

Please send us your comments, questions and bug reports: spela.vintar at, jona.javorsek at