Developed at the University of Lisbon by NLX/FCUL and CLUL

Introduction

CINTIL online concordancer
CINTIL corpus
Acquiring CINTIL
Authorship
Contact us
White papers
Acknowledgements

CINTIL online concordancer

CINTIL online concordancer (beta version) is a freely available online concordancing service to support the research usage of the CINTIL Corpus. This concordancer was developed and is mantained at the University of Lisbon by the NLX-Natural Language and Speech Group of the Department of Informatics, in cooperation with the REPORT Group of CLUL-Centro de Linguística da Universidade de Lisboa.

CINTIL concordancer allows the use of generic patterns to specify the occurrences to be retrieved. This permits to uncover linguistic structures of high complexity and use this service as a powerful research tool.

How to use the concordancer?

You may be interested also in using the companion tools.

CINTIL corpus

CINTIL-Corpus Internacional do Português is a linguistically interpreted corpus of Portuguese. At present it is composed of 1 Million annotated tokens, each one of which verified by human expert annotators. The annotation comprises information on part-of-speech, open classes lemma and inflection, multi-word expressions pertaining to the class of adverbs and to the closed POS classes, and multi-word proper names (for named entity recognition).

This corpus is being developed and maintained at the University of Lisbon by the REPORT Group of CLUL-Centro de Linguística da Universidade de Lisboa in cooperation with NLX-Natural Language and Speech Group of the Department of Informatics. It was the first of its class to be developed for Portuguese in terms of the combined dimensions of size, depth of linguistic information, range of domains and sources, and level of accuracy. The present version is the most recent outcome of an ongoing and long-term endeavour to continuously enlarge and refine this corpus along all these dimensions, with the purpose of providing an enhanced resource for the research on the Linguistics of Portuguese and the development of language technology.

What is in the corpus?

Acquiring CINTIL

The CINTIL corpus is released through ELDA-Evaluation and Language Resources Distribution Agency. Details are provided here.

Authorship

The CINTIL Corpus received several contributions:

Raw text, and previous versions
REPORT Group of the CLUL-Centro de Linguística da Universidade de Lisboa
Present version
The CINTIL Corpus was developed, between March 2004 and December 2006, under the coordination of António Branco (FCUL-Faculdade de Ciências da Universidade de Lisboa) and Maria Fernanda Bacelar do Nascimento (CLUL-Centro de Linguística da Universidade de Lisboa), by the team including Sandra Antunes (CLUL), Florbela Barreto (CLUL), José Bettencourt Gonçalves (CLUL), João Silva (FCUL), Amália Mendes (CLUL) e Filipe Nunes (FCUL), partly in the scope of the TagShare Project, funded by FCT-Fundação para a Ciência e Tecnologia under the research contract POSI/PLP/47058/2002.

Contact us

CINTIL is an ongoing endeavour to develop a corpus with increasingly enhanced accuracy. After having checked the underlying assumptions under which the current version was produced here, in case you have detected something that deserves to be improved, let us know.

Please note that this is not an online linguistic help-desk service, and questions unrelated to the CINTIL Corpus will not be attended.

White Papers

Barreto, Florbela, António Branco, Eduardo Ferreira, Amália Mendes, Maria Fernanda Nascimento, Filipe Nunes and João Silva, 2006, "Open Resources and Tools for the Shallow Processing of Portuguese", Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC2006), Genoa, Italy.

Branco, António and João Silva, 2006, "LX-Suite: Shallow Processing Tools for Portuguese", Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL2006), Trento, Italy, pp.179-182.

Barreto, Florbela, António Branco, Eduardo Ferreira, Amália Mendes, Fernanda Bacelar Nascimento, Filipe Nunes and João Silva, 2006, "Linguistic Resources and Software for Shallow Processing", In Actas do XXI Encontro Anual da Associação Portuguesa de Linguística, Lisbon, Portugal.

Acknowledgments

The work leading to the CINTIL Corpus was partly supported by FCT-Fundação para a Ciência e Tecnologia under the grant POSI/PLP/47058/2002 for the project TagShare.

We are very grateful to Adam Przepiórkowski and his team, from the IPIPAN - The Institute of Computer Science of the Polish Academy of Sciences, Warsaw, for the support in the adaptation of Poliqarp to the Portuguese language and CINTIL constraints.