Building Trustworthy Big Data Algorithms
New algorithm can separate unstructured text into topics with high accuracy and reproducibility
Much of our reams of data sit in large databases of unstructured text. Finding insights among emails, text documents, and websites is extremely difficult unless we can search, characterise, and classify their text data in a meaningful way. One of the leading big data algorithms for finding related topics within unstructured text (an area called topic modelling) is latent Dirichlet allocation (LDA). But when Northwestern University professor Luis Amaral set out to test LDA, he found that it was neither as accurate nor reproducible as a leading topic modelling algorithm should be.
For more see: