Task-Based Evaluation of Anaphora Resolution: The Case of Summarization