Infopresse veracity


Citation preview

Chapter 2

The 4th V: Data Veracity



Every minute 8-10 months ago:

• 48 hours of video are downloaded on Youtube

• 320 new accounts and 98,000 tweets appear on Twitter

• 168,000,000 million emails are sent

• 20,000 new posts on Tumblr

• 6,600 photos appear on Flickr

• Over 20% of all websites are CMS/wordpress/etc…

Every minute today:

• 60 hours of video are downloaded on Youtube

• ??? new accounts and 236,000 tweets appear on Twitter

• 204,000,000 million emails are sent

• 28,000 new posts on Tumblr

• 1,600 photos appear on Flickr !!! No shit!







• Facebook has lost 1.5 million users in Canada and 6 million in the United States

• Yahoo study: 50% of the content that is read and shared by humans is produced by only 20, 000 accounts 0.05%




Gartner is predicting an explosion in Social Media Analytics It spending



In a lot of ways “Big Data” is like Oil…

• Difficult and expensive to extract


Difficult and expensive to extract


Difficult and expensive to store and distribute


Cheapest (and least useful) when its unrefined




In a lot of ways “Big Data” is like Oil…

• Can’t be used by consumers unless refined

• More expensive at every step of refinement


The Market is Producing a plethora of derived higher value data products



In a lot of ways “Big Data” is like Oil…

• Difficult and expensive to extract

• Difficult and expensive to store and distribute

• Cheapest in its unrefined form

• More expensive at every step of refinement

• Produces a plethora of derived products

• and it’s actually quite “dirty”!!!!


Social Data Analytics = Oil Refineries


Social Data is one of the reasons why IBM added a 4th V to the Big Data Definition



6 factors affect Data Veracity …

1. Accuracy: Is it true?

2. Precision: If true, error margin?

3. Reliability: Is it there all the time?

4. Provenance: Can you trace the source?

5. Fidelity: Did it change from the source?

6. Permission: Can you use it for the context?


Black Hat SEO : Blogs

Black Hat Social Marketing : Twitter

Twitter: 50% of brand followers are bots

Or in some cases over 90 %…

Dissapearing Romney: FB as well…

Trying to solve the Veracity problem …

Trying to solve the Veracity problem …

The Big Guys are now doing Veracity …

