Get Smarter about Big Data
- Data is often imperfect – and that’s usually a good thing! You don’t need perfect information to find interesting relationships in the data – in fact, counter-intuitively, “dirty” data is sometimes better for finding relationships, because cleansing may remove the very attribute that enables matching. On the other hand, some information is a lie, as “bad guys” will intentionally try to fool you, or to separate their interactions with your firm into different channels (web, mobile, store) to avoid detection. You should assign a trust level to “known” information, and it rarely approaches 100%.
- Your data can make you smarter as time passes. As new observations continue to accumulate, they enable you to refine your understanding, and even to reverse earlier assertions of your analysis based on what you knew at the time. Therefore, be sure to rerun earlier analyses over the full dataset, and don’t assume the conclusions of your previous analysis were correct.
- Partial information is often enough. It’s surprising how soon you can start to see a picture emerge – with puzzles, the picture can often be identified with only 50% of the pieces, and this aspect of human cognition often applies to machine learning, too. Once the picture starts to emerge, you can more quickly understand each new puzzle piece (observation) by seeing it in the context it occupies among those around it.
This emerging picture should inform your collection efforts – you might need to obtain a newinformation source to follow up a lead from an earlier analysis, or to discard an information source (and the cost of collecting and analyzing it) once you realize it’s not helping.
- More data is always good. The case for accumulating more data – Big Data – is strong: not only does it bring deeper insights, it also can reduce your compute workload – Jeff’s experience shows that the length of time it takes to link a new observation into a large information network actually goesdown as the total number of observations goes up, beyond a certain threshold.
One of the most interesting new sources of Big Data insights is data about the interactions of people with systems – even their mistakes! That’s how Google knows to ask “did you meant this?”
- Can you count? Good! Accurate counting of entities (people, cases, transactions), a.k.a. Entity Resolution, is critical to deeper analysis – if you can’t count, you can’t determine a vector or velocity, and without those, you can’t make predictions. Many interesting analyses in fraud detection involve detecting identities – accurately counting people, knowing when two identities are the same person, or when one identity is actually being used by more than one person, or even when an identity is not a real person at all… Identity matching is also the source of analyses that identify dead people voting and other such fraud.
- Privacy matters, but it’s not an obstacle. Once identity comes into play, then privacy concerns (and regulations) must of course be taken into consideration. There are advanced techniques such as one-way hashes that can be used to anonymize a data set without reducing its usefulness for analytical purposes.
- Bad guys can be smart, too. Skilled adversaries present unique problems, but they can be overcome: to catch them, you must collect observations the adversary doesn’t know you have (e.g. a camera on a route, that they don’t know you have), or compute over your observations in a way the adversary can’t imagine (e.g. recognizing faces or license plates, and correlating that with other location information).
So as adversaries get smarter and more capable of avoiding detection all the time, savvy analysts must continually push the edge of the envelope of applying new techniques and technology to the game.
Very useful blog post!!!
ReplyDeleteGranular Analytics
Analytics for Micro Markets
Hyper-Local Data
Hyper Local insights
Get Smarter about Big Data blog useful and informative post!!!
ReplyDeleteDistributor Locations in Mumbai
Distributors Profile in Delhi
Distributor Lists in Gurugram
Geo-locations in Gurugram