We're now accumulating data at an incredible rate. I mentioned electron microscopy to study the ribosome—each experiment generates several terabytes of data, which is then massaged, analyzed, and reduced, and finally you get a structure. At least in this data analysis, we believe we know what's happening. We know what the programs are doing, we know what the algorithms are, we know how they come up with the result, and so we feel that intellectually we understand the result. What is now happening in a lot of fields is that you have machine learning, where computers are essentially taught to recognize patterns with deep neural networks. They're formulating rules based on patterns. There are are statistical algorithms that allow them to give weights to various things, and eventually they come up with conclusions. When they come up with these conclusions, we have no idea how; we just know the general process. If there's a relationship, we don't understand that relationship in the same way that we would if we came up with it ourselves or came up with it based on an intellectual algorithm. So we're in a situation where we're asking, how do we understand results that come from this analysis? This is going to happen more and more as datasets get bigger, as we have genome-wide studies, population studies, and all sorts of things.
There are so many large-scale problems dependent on large datasets that we're getting more divorced from the data. There's this intermediary doing the analysis for us. To me, that is a change in our way of understanding it. When someone asks how we know, we say that the system analyzed it and came up with these relationships—maybe it means this or maybe it means that. That is philosophically slightly different from the way we've been doing it. The other reason to worry is a cultural reason. The Internet and the World Wide Web have been a tremendous boon to scientists. It's made communication far easier among scientists. It's in many ways leveled the playing field.