I Am Not A Data Luddite

I feel like I need to say this on occasion when speaking with folks about how much data I'm looking for in order to power recommendation services.  More data isn't always better; sometimes it's just more.


It comes back to the notion of trying to boil the ocean - thinking that the more broadly you know about a persons habits the better you'll perform when predicting their behaviors. 

It simply isn't so.

There is a bad signal to noise ratio when trying to tie too many user events together.  I have found that having discreet areas from which to pull event data leads to the most accurate results. 

It is helpful when beginning a new project to gather data as broadly as possible, however.  Just know that in time your giant data set will be winnowed down to only the tasty bits that are statistically significant.