I’ve often joked that data science needs a new emphasis. We tend to over-emphasize the “data” half and forget the “science” half.
The scale and scope of data is new, but science is not. Are we overlooking the latter?
Jaron Lanier seems to think so. I am reading his book, Who Owns the Future? It’s a great read, sure to challenge many of your assumptions about the information economy.
Lanier argues that the middle class is exchanging its livelihood in exchange for small conveniences at the hands of vast data networks, the owners of which are becoming extremely wealthy.
I do not see a clear course of action to prevent this in the book. I do understand his arguments, but I’m not convinced that on net the internet has been a middle class destroyer.
In the end, I’m hoping that like with most of our problems, the market will come through with solutions not imaginable. But I digress.
Lanier offers a checklist of questions on applying the scientific method to data science. Questions include:
- What standard would have to be met to allow for the publication of replication as a result? To what degree must replication require the gathering of different, but similar big data, and not just the reuse of the same data with different algorithms?
- Must there be new practices established, analogous to double-blind tests or placebos, that help prevent big data scientists from fooling themselves? Should there be multiple groups developing code to analyze big data that remain completely insulated from each other in order to arrive at independent results?
These are great questions, and the novelty of big data distracts us from this old-school methodology.