No, I really meant structured. Extracting data from structured documents is surprisingly hard when you need very high accuracy.
What I mean by structured is: invoices, documents containing tables, etc.
Extracting useful data from fully unstructured content is very hard IMO and potentially above the capacity of LLMs (depending on your definition of "useful" and "unstructured")
Partly because the standards, such as X12, have a high startup cost to use them, they aren't very opinionated about the actual content, and you have to get the counterparty on board to use them.
That field has made a leap forward with LLMs.
Positive impact on society includes automated extraction in healthcare pipelines.