Statistical methods for corpus exploitation