Predicting the Future With Social Media

Sitaram Asur and Bernardo A. Huberman at the Social Computing Lab at HP Labs in Palo Alto, California, have demonstrated how social media content can be used to predict real-world outcomes. They used content from Twitter.com to forecast box-office revenues for movies. With a simple model built from the rate at which tweets are created about particular topics, they outperformed market-based predictors. They extracted 2.89 million tweets referring to 24 different movies released over a period of three months. According to the  researchers’ prediction, the movie ”The Crazies” was going to generate 16,8 million dollars in ticket sales during its first weekend.  The true number showed to be very close –  16,06 million dollars. The drama ”Dear John” generated 30,46 million dollars worth of tickets sold, compared to a prediction of 30,71 million dollars.

Reported by British BBC: http://news.bbc.co.uk/2/hi/8612292.stm

Reported by SiliconValleyWatcher: http://www.siliconvalleywatcher.com/mt/archives/2010/04/twitter_study_i.php

The research report: http://www.hpl.hp.com/research/scl/papers/socialmedia/socialmedia.pdf

Previous related iOSINT posts:

https://iosint.wordpress.com/2010/03/29/ted-com-sean-gourley-on-the-mathematics-of-war/

https://iosint.wordpress.com/2010/03/17/social-media-intelligence-output/

Social Media Risks

Below, I have listed online articles that are relevant to issues of privacy, identity theft and fraud in relation to Social Media.

Siciliano, Robert
April 7, 2010
Using Facebook to Steal Company Data
https://www.infosecisland.com/blogview/3579–Using-Facebook-to-Steal-Company-Data.html
Robert Siciliano is CEO of IDTheftSecurity.com a professional speaker and author.

Siciliano, Robert
March 30, 2010
Social Media and Identity Theft Risks PT II
https://www.infosecisland.com/blogview/3456-Social-Media-and-Identity-Theft-Risks-PT-II.html
Robert Siciliano is CEO of IDTheftSecurity.com a professional speaker and author.

Siciliano, Robert
March 24, 2010
Social Media and Identity Theft Risks PT I
https://www.infosecisland.com/blogview/3417-Social-Media-and-Identity-Theft-Risks-PT-I.html
Robert Siciliano is CEO of IDTheftSecurity.com a professional speaker and author.

Himley, Mike
March 19, 2010
The limits of social network privacy

Siciliano, Robert
March 15, 2010
Social Media Sticky Situations
https://www.infosecisland.com/blogview/3283-Social-Media-Sticky-Situations.html
Robert Siciliano is CEO of IDTheftSecurity.com a professional speaker and author.

Social media: Marketing Input, Intelligence Output

Even the slowest followers in the print media mainstream have by now picked up and echoed the imperative to make use of social media for the purpose of reaching out to customers: Get a corporate Twitter ID and twit about everything new in your offerings. Get a corporate Facebook group and start one-on-one dialogues with the buyers of your products. All of that is a new way of doing marketing.

However, very few are talking about what comes out at the other end of these social media based, outbound marketing & PR efforts. While companies have learned to do a lot of Marketing Input, they can also take the next step and pick up the Intelligence Output. By monitoring and listening to what is going on in various social media channels, companies will be able to collect information about their own brand reputation, competitors’ brand reputation, customer satisfaction levels, competitors’ activity, competitors’ customer satisfaction levels, competitors’ product problems etc etc.

In the report “Top 10 trends in Business Intelligence for 2010” from HP (Hewlett-Packard), Social Computing (the use of online social media) is named as one of the top 10 trends for 2010 and described as an increasingly important source of decision support data.

“An important influence in the continuing BI evolution is the impact of social computing on decision-making processes, methods of collaboration and interaction, and enhanced customer experience. BI can expand the insight it provides organizations if it encompasses the information from interactions that occur in social computing environments. The dynamic conversation channels available through blogs, online communities, Twitter, Facebook, LinkedIn, and a host of social computing venues engage customers, prospects, partners, influencers, and employees—touching virtually every key constituent in an organization’s value chain. Very importantly, these channels are reshaping how customers evaluate and choose products, how brands are perceived, how business processes evolve, and how people work together.

Today most organizations are only beginning to analyze the learnings from online conversation. Technologies such as Social Mining and Social Intelligence use sophisticated data mining and text analytics to understand the implicit meaning of this unstructured data, which is completely reliant on the context in which it occurs. These include social behaviors, attitudes, relationships, and knowledge, all of which carry subjective qualities not easily categorized.  We will see the expanded use of these disciplines to harvest both implicit and explicit information. They may predict future behavior that can impact plans, for example, when strong online chatter suggests product interest that drives a decision to increase production. Or they may help organizations respond to explicit feedback, for example, when user experiences reported in communities lead to a product adjustment. This wealth of intelligence can and should align with, and augment, the intelligence delivered through the organization’s traditional BI initiatives.  For now, the integration of BI with social computing will be managed through the attention of a vigilant few within an organization. Emerging technologies, such as MapReduce, are evolving to help bridge the gap between this new frontier and traditional BI. Look for BI to expand its footprint beyond its traditional realm as it embraces the additional insight available through social computing.”

There is at least one commercial service provider specializing in tracking social media: Whitevector‘s Chat Reports service is a web-based service that provides consumer brand teams with a comprehensive picture of what is being said within online dissocial media discussions such as forums, blogs, and networks like Facebook and Twitter.

Tamara Barber at Forrester recently posted an article on her blog,  Three Key Considerations On Social Media For Market Research, where she lists three of the challenges that have to be met by systems for mining social media. She quotes people from Conversition, Attensity and Alterian. The headlines are:

  • Process and methods need to be developed to make social media data be another source for Marketing Research
  • To “connect the dots” on text mining data, you need to extract noun-verb relationships, sentiment, suggestions and intent.
  • “[In social media research] 80% of your time is spent on identifying the right content, getting it into the right shape, and getting the gems out of it. Social media research is not magic.

Social media vs Personal integrity and Security

I always recommend people to be very careful about what they publish about themselves on the internet. It is a door you can open, but not close. Google’s effective content indexing and caching function further means that your content may remain searchable and viewable online for some time even after you removed it. In brief terms: publishing something on the web is like letting the genie out of the bottle. A picture you put on the internet is out of your control. When you remove it from your blog, it may already be on my harddrive, or on the harddrive of a lonely guy in the next town who still hates you for something you said to him back in junior high and blames you for his social misery. Or the picture of your kids on the beach that you uploaded for grandma to see: you will have to accept the risk that it will show up as part of a photo mash-up promoting a child pornography website.

While most Facebook users keep their information hidden from people that are not on their friends-list, blogs are typically completely unrestricted: it is open for anyone to see and read, and the blog owner does not know who reads it, or why, or when. On Youtube, you have the option of restricting access to your videos, but that is an active and explicit choice you have to make.

Think of it this way: would you ever place copies of your home videos, family photos and stories from your private life in brown boxes around town for the purpose of letting friends and family enjoy them? No you wouldn’t. But that is exactly what many people do each day on Facebook, Youtube, blogs, Flickr, Picasaweb, Twitter, Bambuser, MySpace and dozens of other social community websites. Oh, there is a difference of course: these services not only make your content available down-town – they make it available from anywhere in the world. And anyone can steal a copy and bring home, whithout you knowing.

The anonymity illusion

Many community users live under the illusion that they are anonymous and have their identity protected by not revealing their true name. However, over time, they will often publish little details, one by one, which before long allows a dedicated malevolent to deduce the true identity of the person. Online-friends in the contacts list or people who post comments on the persons blog entry may in turn have blogs or be community users, and what they publish about themselves is also useful as clues to the identity of the primary person. Such clues can be seemingly non-significant things such as sex, age, birthday, pet, hobby, type of home, proximity to some town, proximity to the ocean, name of school, pictures showing the neighborhood and so on. With the help of services like StayFriends.de, Pipl.com, and public tax records, you can cross reference pieces of information and through deduction find out who the online alias is in reality – or IRL (in real life), as the jargon term is.

A very large mistake to make is to have one online identity where you reveal your real life identity and publish content about your real life, while at the same time having a second “anonymous” identity which you might use for the purpose of expressing controversial, politically incorrect, socially embarrasing, or illegal content. The day somebody succeeds in revealing the real life identity hiding behind that second, “anonymous” identity, the damage is multiplied thanks to your publication of information about yourself under your true identity. On this particular subject, stay tuned for the true case story of the Unethical Policeman, coming soon.

In addition to all this, there are sites that aggregate information from social media and facilitates for any intelligence collector: For example, PleaseRobMe.com lists Twitter users who post twits using a geographic positioning service, which reveals that they are not at home – and therefore offers an opportunity for burglars.