The Few Datasets Powering Research

Machine learning research increasingly relies on a small number of popular datasets, mostly from elite institutions.

DataEthicsAcademia

Key Takeaways

To test and compare new computer programs, researchers often use the same shared sets of data. A study looked into which datasets are being used and found that the field is becoming more focused on a very small number of popular ones. It turns out that most of these influential datasets were created by people at just a handful of elite universities. This concentration has an impact on how scientific progress is measured and raises important questions about fairness and equal opportunity for all researchers in the field.