Elliott Ash and Stephen Hansen in VoxEU:
Public awareness about the possibilities of large language models (LLMs) to perform complex tasks and generate life-like interactions has exploded in the past year. This has led to an immense amount of media attention and policy debate about their impact on the economy and on society (Ilzetski and Jain 2023). In this column, we focus on the potential role of LLMs as a research tool for economists using text data in their research. We first lay out some key concepts needed to understand how such models are built, and then pose four key questions that we believe need to be addressed when considering the integration of LLMs into the research process. For further details, we refer readers to our recent review article on text algorithms in economics (Ash and Hansen 2023).
Large language models are essentially predictive models for sequential data. Given an input sequence of text, they can be targeted towards predicting randomly deleted words – akin to filling in gaps in a time series – or the most likely subsequent word, mirroring the process of forecasting the next data point in a time series. Consider predicting the word that underlies [MASK] in the following sentences:
- As a leading firm in the [MASK] sector, we hire highly skilled software engineers.
- As a leading firm in the [MASK] sector, we hire highly skilled petroleum engineers.
Most humans would predict that the word underlying [MASK] in the first sentence relates to technology, and that the word underlying [MASK] in the second sentence relates to energy. The key words informing these predictions are ‘software’ and ‘petroleum’, respectively. Humans intuitively know which are the important words in the sequence they must pay attention to in order to make accurate forecasts. Moreover, these words may lie relatively far from the target word to predict, as in the above examples.
More here.