Nouveau corpus de Tweets dans CoMeRe

Dans la banque de corpus CoMeRe ( hébergé par Ortolang, un nouveau corpus de Tweets sur les intermittents du spectacle vient d’être ajouté :

Longhi, J., Borzic, B., Alkhouli, A.(2016). #Intermittent: constitution d’un corpus lié à un événement discursif controversé. In Chanier T. (ed) Banque de corpus CoMeRe. : Nancy. []


The corpus #Intermittent gathers tweets of 215 accounts identified as interested in the issue of the intermittents (contract/temporary workers from the entertainment industry). The Twitter accounts (twittos in French) have permitted the extraction of 586 239 tweets: the corpus is constituted by the 10876 tweets from these 58239 with the hashtag « intermittent ». The corpus has been converted to the TEI format within the framework of the project CoMeRe (Communication médiée par les réseaux, Network mediated communication) . The CoMeRe projet aims to gather different corpus that represent the forms of communication in French on the networks (Internet, phone, etc.), all structured and informed in the same way, diffused in open acces for research purposes. The CoMeRe projet has received the support of ORTOLANG (the French equivalent of DARIAH) and of the national consortium Written-Corpus (‘Corpus-écrits’) , subsection of Huma-Num.

Articles : Presentation CoMeRe et traitements sur Tweets

Voici deux articles, écrits par des membres de CoMeRe, accessibles dans les archives ouvertes (HAL) en version préprint :

  • Chanier, T., Poudat, C., Sagot, B., Antoniadis, B., Wigham, C.R., Hriba L., Longhi,J. & Seddah, D. (to appear, 2014). « The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres ». Journal of Language Technology and Computational Linguistics (JLCL). Special Issue : « Building And Annotating Corpora Of Computer-Mediated Discourse: Issues and Challenges at the Interface of Corpus and Computa-tional Linguistics » (ed. by Michael Beißwenger, Nelleke Oostdijk, Angelika Storrer & Henk van den Heuvel). []
  • Djemili S., Longhi J., Marinica C., Kotzinos D., Sarfati G.-E. (to appear, 2014). What does Twitter have to say about ideology? « NLP 4 CMC: Natural Language Processing for Computer-Mediated Communication / Social Media » – Pre-conference workshop at Konvens2014 , Germany (2014)