they should be formatted as two files of text lines sharing an index a la: stem_index|stem then word|stem_index|POS_index if the word doesn't have a stem word say, conjunctions and pronouns it should be included as it is. Is there such a thing? or how could you suggest one could build it with available resources?
lbrtchx