Can computers learn from humans to see better? Inferring scene semantics from viewers' eye movements