Extending the single words-based document model: a comparison of bigrams and 2-itemsets