Developed at the University of Lisbon by NLX/FCUL and CLUL
concordancer | introduction | what's in? | how to? | release | versão portuguesa
CINTIL online concordancer (beta version) is a freely available online concordancing service to support the research usage of the CINTIL Corpus. This concordancer was developed and is mantained at the University of Lisbon by the NLX-Natural Language and Speech Group of the Department of Informatics, in cooperation with the REPORT Group of CLUL-Centro de Linguística da Universidade de Lisboa.
CINTIL concordancer allows the use of generic patterns to specify the occurrences to be retrieved. This permits to uncover linguistic structures of high complexity and use this service as a powerful research tool.
How to use the concordancer?
You may be interested also in using the companion tools.
CINTIL-Corpus Internacional do Português is a linguistically interpreted corpus of Portuguese. At present it is composed of 1 Million annotated tokens, each one of which verified by human expert annotators. The annotation comprises information on part-of-speech, open classes lemma and inflection, multi-word expressions pertaining to the class of adverbs and to the closed POS classes, and multi-word proper names (for named entity recognition).
This corpus is being developed and maintained at the University of Lisbon by the REPORT Group of CLUL-Centro de Linguística da Universidade de Lisboa in cooperation with NLX-Natural Language and Speech Group of the Department of Informatics. It was the first of its class to be developed for Portuguese in terms of the combined dimensions of size, depth of linguistic information, range of domains and sources, and level of accuracy. The present version is the most recent outcome of an ongoing and long-term endeavour to continuously enlarge and refine this corpus along all these dimensions, with the purpose of providing an enhanced resource for the research on the Linguistics of Portuguese and the development of language technology.
What is in the corpus?
The CINTIL Corpus received several contributions:
Contact us using the following email address: 'cintil' concatenated with 'at' concatenated with 'di.fc.ul.pt'.
CINTIL is an ongoing endeavour to develop a corpus with increasingly enhanced accuracy. After having checked the underlying assumptions under which the current version was produced here, in case you have detected something that deserves to be improved, let us know.
Please note that this is not an online linguistic help-desk service, and questions unrelated to the CINTIL Corpus will not be attended.
Barreto, Florbela, António Branco, Eduardo Ferreira,
Amália Mendes, Maria Fernanda Nascimento, Filipe Nunes and
João Silva, 2006, "Open Resources and Tools for the Shallow
Processing of Portuguese", Proceedings of the 5th
International Conference on Language Resources and Evaluation
(LREC2006), Genoa, Italy.
Branco, António and João Silva, 2006, "LX-Suite: Shallow Processing Tools for Portuguese", Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL2006), Trento, Italy, pp.179-182.
Barreto, Florbela, António Branco, Eduardo Ferreira, Amália Mendes, Fernanda Bacelar Nascimento, Filipe Nunes and João Silva, 2006, "Linguistic Resources and Software for Shallow Processing", In Actas do XXI Encontro Anual da Associação Portuguesa de Linguística, Lisbon, Portugal.
The work leading to the CINTIL Corpus was partly supported by
FCT-Fundação para a Ciência e Tecnologia under the
grant POSI/PLP/47058/2002 for the project TagShare.
We are very grateful to Adam Przepiórkowski and his team, from the IPIPAN - The Institute of Computer Science of the Polish Academy of Sciences, Warsaw, for the support in the adaptation of Poliqarp to the Portuguese language and CINTIL constraints.