"The difficulties in data collection for a variety of individual and profiles, as well as the ubiquity of deleted content, not recoverable to us, forces us to adopt a dual approach, and to build two corpora."
"In both cases, due to the limitations in data collection and available material, the quantity of training material is imbalanced between authors, a potential problem in machine learning."
Collection
[
|
...
]