Research project CÓDIGO
In this page we will keep you posted on the developments in our research project Código, the Spanish acronym for “objective characterization of the general difficulty of originals”.
Código aims to develop a computer application (Cody, henceforth) to analyze original and translated text excerpts in English, French, and Spanish, in order to predict their difficulty. Of course, reading and comprehension difficulty can only be determined grosso modo, through computations on a series of discrete textual parameters. Our starting point is just the set of parameters which have proved to have an influence on reading and comprehension difficulties, such as word frequency, sentence length and complexity, and proposition count, to name but a few. Difficulty indexes will be the outcome of statistical analyses of text values (independent variables) and the results of testing the reading speed and comprehension (dependent variables) of more than 100 subjects, who will have been profiled in four ways: demographics, sociolinguistics, psychometrics, and folk theories entertained. Additional experiments will be carried out to improve external validity.
Once Cody is finished, it will be made available to users through an Internet-accesible interface. Users will be able to enter text excerpts and receive results in two complementary ways: (a) by means of three indexes of lexical, syntactic, and textual difficulty, which will range between 1 and 100; (b) by ranking the new text within a text bank of 200 texts in that language for each parameter. We hope that results will be reliable and precise enough to be able to find applications for Cody to teaching (by associating text difficulty levels with training/learning stages), empirical research (to reduce relevant variables in experimental texts), and the market (as another criterion to compute translation prices, instead of applying current impressionistic rules of thumb to text difficulty).
The most salient features of Código may be that (1) it is market-oriented, and does not focus on teaching; (2) it targets adult readers, and not children; (3) it will attempt to level out text difficulty in three languages –probably the first real multilingual project ever. Its endproduct, Cody, will also (4) compute cohesion and aspects of mental processing difficulty, as opposed to most readability formulas. It should also (5) better accommodate texts from different types and genres, for it will not combine heterogeneous criteria; (6) be easy to use and yield results easy to be understood by laypeople, as opposed to other approaches; and (7) Cody will also be scalable, so it will be updated and enlarged with more languages and improved functions in the future.
Código is a three-year research program, made possible by the Spanish Ministry of Science & Innovation grant No. FFI2010-15724, within the R+D National Program. Building the basic blocks of the system—such as developing ad hoc computing tools, compiling both the lexical and the texts databases, designing Cody, and drafting and pilot-testing subjects’ tests—is our goal for the first year. We will use the second year to develop Cody, improve text tagging, and test subjects. In the third year, we will carry out additional experiments to improve external validity; we will also analyze data and carry out statistical operations, and develop the analytical modules in Cody.
As an interdisciplinary project, Código is a joint effort of three research groups and several independent researchers and professionals. The groups are LLT, on Spanish language & linguistics; PETRA, on cognitive translatology; and TIP, on natural language processing. This is the full list of researchers:
|Copyleft 2011 · Terms & Conditions · Privacy & Registration|