The field of Distributional Semantics (DS) is built on the ‘distributional hypothesis’, which states that meaning can be recovered from statistical information in observable language. It is however notable that the computations necessary to obtain ‘good’ DS representations are often very involved, implying that if meaning is derivable from linguistic data, it is not directly encoded in it. This prompts questions related to fundamental questions about language acquisition: if we regard text data as linguistic performance, what kind of ‘innate’ mechanisms must operate over that data to reach competence? In other words, how much of semantic acquisition is truly data-driven, and what must be hard-encoded in a system’s architecture? In this paper, we introduce a new methodology to pull those questions apart. We use state-of-the-art computational models to investigate the amount and nature of transformations required to perform particular semantic tasks. We apply that methodology to one of the simplest structures in language: the word bigram, giving insights into the specific contribution of that linguistic component.1
How much competence is there in performance? Assessing the distributional hypothesis in word bigrams / Seltmann, J.; Ducceschi, L.; Herbelot, A.. - 2481:(2019). (Intervento presentato al convegno 6th Italian Conference on Computational Linguistics, CLiC-it 2019 tenutosi a ita nel 2019).
How much competence is there in performance? Assessing the distributional hypothesis in word bigrams
Ducceschi L.;Herbelot A.
2019-01-01
Abstract
The field of Distributional Semantics (DS) is built on the ‘distributional hypothesis’, which states that meaning can be recovered from statistical information in observable language. It is however notable that the computations necessary to obtain ‘good’ DS representations are often very involved, implying that if meaning is derivable from linguistic data, it is not directly encoded in it. This prompts questions related to fundamental questions about language acquisition: if we regard text data as linguistic performance, what kind of ‘innate’ mechanisms must operate over that data to reach competence? In other words, how much of semantic acquisition is truly data-driven, and what must be hard-encoded in a system’s architecture? In this paper, we introduce a new methodology to pull those questions apart. We use state-of-the-art computational models to investigate the amount and nature of transformations required to perform particular semantic tasks. We apply that methodology to one of the simplest structures in language: the word bigram, giving insights into the specific contribution of that linguistic component.1File | Dimensione | Formato | |
---|---|---|---|
2019_how_much_competence_in_performance.pdf
accesso aperto
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Creative commons
Dimensione
376.29 kB
Formato
Adobe PDF
|
376.29 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione