Research project DIGO

In this page we will keep you posted on the developments in our research project digo, the Spanish acronym for “objective characterization of the general difficulty of originals”.

December 2010
digo aims to develop a computer application (Cody, henceforth) to analyze original and translated text excerpts in English, French, and Spanish, in order to predict their difficulty. Of course, reading and comprehension difficulty can only be determined grosso modo, through computations on a series of discrete textual parameters. Our starting point is just the set of parameters which have proved to have an influence on reading and comprehension difficulties, such as word frequency, sentence length and complexity, and proposition count, to name but a few. Difficulty indexes will be the outcome of statistical analyses of text values (independent variables) and the results of testing the reading speed and comprehension (dependent variables) of more than 100 subjects, who will have been profiled in four ways: demographics, sociolinguistics, psychometrics, and folk theories entertained. Additional experiments will be carried out to improve external validity.

Once Cody is finished, it will be made available to users through an Internet-accesible interface. Users will be able to enter text excerpts and receive results in two complementary ways: (a) by means of three indexes of lexical, syntactic, and textual difficulty, which will range between 1 and 100; (b) by ranking the new text within a text bank of 200 texts in that language for each parameter. We hope that results will be reliable and precise enough to be able to find applications for Cody to teaching (by associating text difficulty levels with training/learning stages), empirical research (to reduce relevant variables in experimental texts), and the market (as another criterion to compute translation prices, instead of applying current impressionistic rules of thumb to text difficulty).

The most salient features of digo may be that (1) it is market-oriented, and does not focus on teaching; (2) it targets adult readers, and not children; (3) it will attempt to level out text difficulty in three languages –probably the first real multilingual project ever. Its endproduct, Cody, will also (4) compute cohesion and aspects of mental processing difficulty, as opposed to most readability formulas. It should also (5) better accommodate texts from different types and genres, for it will not combine heterogeneous criteria; (6) be easy to use and yield results easy to be understood by laypeople, as opposed to other approaches; and (7) Cody will also be scalable, so it will be updated and enlarged with more languages and improved functions in the future.

digo is a three-year research program, made possible by the Spanish Ministry of Science & Innovation grant No. FFI2010-15724, within the R+D National Program. Building the basic blocks of the system—such as developing ad hoc computing tools, compiling both the lexical and the texts databases, designing Cody, and drafting and pilot-testing subjects’ tests—is our goal for the first year. We will use the second year to develop Cody, improve text tagging, and test subjects. In the third year, we will carry out additional experiments to improve external validity; we will also analyze data and carry out statistical operations, and develop the analytical modules in Cody.

As an interdisciplinary project, digo is a joint effort of three research groups and several independent researchers and professionals. The groups are LLT, on Spanish language & linguistics; PETRA, on cognitive translatology; and TIP, on natural language processing. This is the full list of researchers:

IP Ricardo Muñoz Martín Las Palmas ulpgc PETRA
software Francisco J. Carrreras Riudavets Las Palmas ulpgc TIP
Zenón Hernández Figueroa Las Palmas ulpgc TIP
Gustavo Rodríguez Rodríguez Las Palmas ulpgc TIP
José Ignacio Perea Sardón Granada equus PETRA
subjects Mª Lluïsa Presas Corbella Barcelona uab PETRA
Alicia Bolaños Medina Las Palmas ulpgc PETRA
María Castro Arce Leipzig ialt PETRA
Álvaro Marín García Las Palmas PETRA
Celia Martín de León Las Palmas ulpgc PETRA
Juan Luis Núñez Alonso Las Palmas ulpgc
Spanish Marina Díaz Peralta Las Palmas ulpgc LLT
Tomás Conde Ruano Vitoria upv PETRA
Ana Mª García Álvarez Las Palmas ulpgc PETRA
Mª Jesús García Domínguez Las Palmas ulpgc LLT
Gracia Piñero Piñero Las Palmas ulpgc LLT
Mª José Reyes Díaz Las Palmas ulpgc ALET
French Susana Cruces Colado Vigo uvigo PETRA
Agustín Darías Marrero Las Palmas ulpgc
Robert Neal Baxter Vigo uvigo
Angélica Pajarín Canales Granada ugr
Paz Orois Fernández Vigo uvigo
English Goretti García Morales Las Palmas ulpgc TLMD
Luis Alonso Bacigalupe Vigo uvigo
José Jorge Amigo Extremera Las Palmas ulpgc PETRA
Gisela Marcelo Wirnitzer Las Palmas ulpgc
Rubén Rodríguez de la Fuente Madrid
Mª Jesús Rodríguez Medina Las Palmas ulpgc ESS

