Make your sources talk: elicitation, motivation, provocation… investigative journalists do it too

If you understand Swedish, you must listen to this presentation titled “The ABC of investigative journalism”, by Nils Hanson from Swedish national television (SVT). It was made during the 2012 seminar on the topic of investigative journalism held in Malmö, Sweden, during the week-end of March 23-25. This was the 16:th time Nils Hanson made this presentation.

The interesting thing here is that Nils Hanson represents the community of investigative journalists and reporters, who think of them selves as being among “the good guys”, revealing the truth to the public, uncovering what corrupt politicians hide and even sometimes shedding light on dodgy activities of government intelligence and security organizations.

However, when listening to Nils Hanson, you will hear him describe to his audience of journalists how they should go about in order to make an unwilling human source talk, how they should go about in order to make an unwilling private person agree to becoming the subject of a news story and so on.

If you have government or military intelligence training in the field of HUMINT, you will immediately notice that the methods recommended by Nils Hanson are spot-on similar to the methods used by government and military intelligence operators. The key words are elicitation, motivation, provocation, flattery, favors and favors in return and so on:

– Build trust and rapport by starting out talking about something irrelevant non-sensitive and/or slightly humouristic
– Reduce tension in a situation where the source is refusing to talk by asking for something trivial like a cigarette, and then a match and so on
– Motivate the source to talk by providing gifts without asking for anything in return and by making considerable and noticeable efforts. This will build confidence, and also a sense of indebtedness.
– When a source is refusing to be the subject of a news story or refusing to being interviewed in television, tell the source that full control is with him/her, and start moving in small steps while telling the source that he/she can back out at any time. Having committed to a recorded interview, where several people spent a lot of time, the source will seldom back out and tell them all that their efforts and work have been for nothing.

All of these methods push well-known and simple psychological buttons and leverage mechanisms of human nature such as our reluctance to jump of the band wagon once we have been on it for a while. Normal people have a strong inner voice that talks about commitment, promise, responsibility, duty, gratitude, debt, payback, fairness etc.

I am sure not many of the journalists at the Gräv 2012 seminar would feel comfortable to think of them selves as working with the same toolbox as an intelligence officer managing his human assets.

http://bambuser.com/v/2494983

Flickr picture uploads from 24 hours in print – HUGE amounts of photos

British Creative Review writes about an installation by Erik Kessels on display at Foam in Amsterdam. Kessels has printed out the amount of photos that are uploaded to Flickr during 24 hours, allegedly 1 million photos.

Creative Review writes: <<“We’re exposed to an overload of images nowadays,” says Kessels. “This glut is in large part the result of image-sharing sites like Flickr, networking sites like Facebook, and picture-based search engines. Their content mingles public and private, with the very personal being openly and un-selfconsciously displayed […] >>

What is most interesting about Kessels installation is that it turns this abstract number into something very concrete, that you can relate to physically: several rooms with piles of photos covering floor and walls.  That gets a different message though compared to the old million-billion-trillion rant.

Speaking of which, here are some more interesting figures about photos on the internet:

5 billion – Photos hosted by Flickr (September 2010).
3000+ – Photos uploaded per minute to Flickr.
130 million – At the above rate, the number of photos uploaded per month to Flickr.
3+ billion – Photos uploaded per month to Facebook.
36 billion – At the current rate, the number of photos uploaded to Facebook per year.

(Source: http://royal.pingdom.com/2011/01/12/internet-2010-in-numbers/ )

The article in Creative Review, with photos (!) showing the massive amounts of photo print-outs in the installation:

http://www.creativereview.co.uk/cr-blog/2011/november/24-hours-in-photos

This installation by Erik Kessels is on show as part of an exhibition at Foam in Amsterdam that looks at the future of photography. It features print-outs of all the images uploaded to Flickr in a 24-hour period…

Videos of presentations at DerbyCon 2011 – a must-see for anyone in information security or intelligence

During the weekend of September 30 – October 2, the DerbyCon took place in Louisville, Kentucky, at the Hyatt Regency hotel. During those three days, a number of extremely skilled and knowledgeable speakers presented on different topics in three parallel tracks. All the presentations were video recorded and are now available online.

There is a very high likelihood that you will learn valuable things from watching these videos, either from an information security standpoint, or from an open source intelligence standpoint.

http://www.irongeek.com/i.php?page=videos/derbycon1/mainlist
http://www.derbycon.com/

Use Google to search. No really.


Over 60 percent of searches include only three words or less.

In over 80% of all search queries made, less than five words are used.

According to the Internet monitoring company Hitwise,  the distribution of number of words used in search queries looked like this by January 2009. The statistics cover searches made, not people doing searches, which is important. Still, in over 80% of searches, less than 5 words are used in the search query. The most common search query length is 2 words.

Now, we all make a lot of searches, and in some cases we have learnt that typing one or two specific words will give us the site we are looking for on the first page of search hits. Also, a lot of people have learnt that putting a single word XYZ directly in the address field of the browser will take them to http://www.XYZ.com. Doing the same thing in Google Chrome will deliver a search on that word. People are a combination of lazy, practical, and smart – they quickly learn what works and repeats that. When they search for something for real, the more word-rich queries come in to use.

For anyone looking for something specific on the web, the chances of finding it increase if you have the knowledge to utilize the full power of the search engines. Think of it as shifting from first gear of search engine usage to second, third, fourth and fifth.

There are two dimensions to this:

1) Make sure you use the most appropriate and adequate search engine for the information you need to find. This is treated upon further in a separate article.
2) Make sure you know how to tell the search engine exactly what you are looking for, instead of throwing in a bunch of keywords in random order. This is all about making use of search operators and special characters which allows you to specify alot more complex conditions than “I want to see pages that contain these words”.

Searching more effectively with Google

There is a reason why Google is the dominating search engine: they index more, they index quicker, and they are good at understanding what results people are interested in. In fact, Google’s ability to index web content in combination with the powerful search operators specific to Google, has given birth to Google Hacking. Google hacking has nothing to do with breaching Google security. It is about using advanced searching with Google as part of the research and reconnaissance phase of a network system penetration attempt for the purpose of a) spotting targets or b) finding possible points of attack against a target.

Apart from leveraging the advanced search operators of Google in the hunt for exploit opportunities, you will of course benefit greatly in your search for information from being skilled at pushing the right buttons of the Google search engine.

Below is a list which cover what you need to know in order to make Google do a better job for you when searching. Roughly, these search operators can be put in three groups:

1) those that say what to search for, i.e. what words and numbers to match, and

2) those that say where to search, i.e. operators that limit the scope of the search or specify where the match should be,

3) those that are specialized information lookup operators, which make Google return results of a certain kind only

Operators that say WHAT to search for

1) “What” operators Result / Effect / Meaning
secret information Will find content that contain each of the words anywhere in the text, but not necessarily side by side
“secret information” Will match the exact phrase and word order
~secret Includes synonyms, alternative spellings and words with adjacent meaning
secret information OR intelligence Will find content that contain the word secret plus either one of the words information and intelligence
intelligence -information Will find content that contains the word intelligence while not containing the word information
intelligence +secret Will search for content that contains intelligence and secret, with secret as required content
intelligence-community Will find content where the two words exist separated, or written as one piece, or hyphenated
“central * agency” The * character serves as wild card for one or more words
Note! When the * character is used between two numbers in a search with no letters, it will function as a multiplication operator, returning the mathematical result multiplying the two numbers.
“US” “gov” Google automatically includes synonyms and full-word versions of abbreviations. Putting each term in quotes assures that the search is made for exactly those terms.
“coup d’etat” 1945..1969 Will find pages that contain any number in the range 1945-1969 and the phrase “coup d’etat”

Operators that say WHERE to find a match between search term and content

2) “Where” operators Result / Effect / Meaning
 define: Will look for the search term in word list, dictionary and glossary type of pages, e.g. define:secret
 define Alternative syntax for define:. Will look for the search term in word list, dictionary and glossary type of pages, e.g. define secret
intelligence ~glossary Will find the word intelligence on pages that are of a glossary or dictionary or encyclopedia type
site: Will limit the search to include only the internet domain specified, which can be a top domain, a main domain, a sub domain and so on. Examples:site:mil (combine several with the OR operator between them: site:mil OR site:gov)
site:groups.google.com
inurl: Will limit the search to only look for the search terms in the page URL. This example will show results where either one or both of wiki and sigint are part of the URL:inurl:wiki sigint
allinurl: Very similar to inurl: but with the difference that all of the words specified must be found in the URL.
intitle: Will limit the search to only look for the search terms in the title of pages. Title in this context means the web document HTML title, which is what you see written in the browser tab or browser window top frame.
allintitle: Very similar to intitle: but with the difference that all of the words specified must be found in the page title.
inanchor:  Will limit the search to only look for the search terms in the anchor text of hyperlinks on pages. The anchor text is the text that was turned into a link to some page by the page creator. The anchor text may reveal something about what the page creator thinks about the page linked to, for example “Useful information on security”.
allinanchor: Very similar to inanchor: but with the difference that all of the words specified must be found in the anchor text.
intext:  Will limit the results to include only pages where the search term was found in the text of the page.
allintext:  Very similar to intext: but with the difference that all of the words specified must be found in the text of the page.
filetype:  Will limit the results to include only files with the file extension specified, e.g. filetype:pdf to get only PDF documents
ext:  Short-hand version of filetype: that provides the exact same result
cache:  Will show the Google cache version of a web site if available, e.g. cache:cia.gov
Note! This cannot be combined with additional search terms or operators
related:  Will show pages that have something in common with or are related to the site you specify, e.g. related:cia.gov
Note! This cannot be combined with additional search terms or operators
link:  Will show pages that contain a link pointing to the URL you specify, e.g. link:www.cia.gov/library

Special search operators – valid only on specific Google sites

3) Special operators Result / Effect / Meaning
 location: news.google.com – presents news search results related to the location, e.g. location:kabul
 source: news.google.com – presents news search results from the source specified, e.g. source:times
author: groups.google.com – presents posts written by the author specified, e.g. author:einstein
group: groups.google.com – presents posts made in the group specified, e.g. group:publicintel

When looking for information where you only have a vague idea what you should search for, only have parts of a name or only an approximate date range, advanced queries combining several such bits and pieces, involving both the OR operator, phrase quotes, and the * wild card will let you cover all bases and perform one single search that returns all possible matches.

Here are a few interesting examples that apply several of the operators listed above.

 

  • PDF-files published by FBI that talk about interrogation, methods, and deception:

site:fbi.gov ext:pdf +interrogation +methods +deception

  • We pages under the .mil top domain where the page title contains the word “staff”, and the page contains a link with the word “login”, excluding PDF-files as well as word documents:

site:mil intitle:staff inanchor:login -ext:pdf -ext:doc

  • Excel files published with the word “internal” as part of the URL, with the phrase “internal use only” in the file:

inurl:internal ext:xls OR ext:xlsx “internal use only”

 

Learn more about how to search with Google:

http://www.googleguide.com

http://www.google.com/support/websearch/bin/answer.py?hl=en&answer=136861&rd=1

Free tools for turning search hit pages into RSS feeds

When working with environmental scanning, competitive intelligence scanning, industry monitoring, corporate reputation monitoring or any similar activity, many people use a feed reader and organize feeds on their topics and keywords of interest. A wide choice of feed readers exist, and should I mention just one, that is probably NetVibes.com. NetVibes.com is one step ahead since this (free) service allows you to organize feeds and many other types of content in a collection of tabs of your own design – all of it kept online for access and use from anywhere.

While this approach to scanning, monitoring and collecting is working fine for many people, a problem shows up when you want to monitor search results from some search engine or directory which does not provide the results as a feed of any kind.  For example, this is the case when you do a regular web seach with Google: the resuts cannot be obtained formatted as an RSS or Atom feed. So, if you are monitoring PDF documents issued by the US government or US military about piracy in the Gulf of Aden, using the following query: "Gulf of Aden" piracy ext:pdf site:(.mil OR .gov), then you cannot get those search results as a feed from Google.

The solution is to use one of a number of free services that formats any web page into a feed, making it readable and presentable by any feed reader. Raju, the owner and editor of TechPP, made a list in April 2009 of what he considers to be the top 10 services of this kind: Top 10 Free Tools to Create RSS for Any Website.

The services listed by TechPP are:

Feedity.com (not a free service)

Face recognition software is pervasive and free

http://face.com/

2010-05-03:
7 Billion Scanned Photos Later, Face.com Opens Up To Developers
http://techcrunch.com/2010/05/03/7-billion-scanned-photos-later-face-com-opens-up-to-developers/

2010-05-03:
Face.com opens its face recognition tech to devs
http://news.cnet.com/8301-27076_3-20003936-248.html

2010-06-11:
The Future of Privacy: Facial Recognition, Public Facts, and 300 Million Little Brothers
http://volokh.com/2010/06/11/the-future-of-privacy-facial-recognition-public-facts-and-300-million-little-brothers/

2010-06-16:
Police facial recognition comes to the iPhone
http://www.itworldcanada.com/news/police-facial-recognition-comes-to-the-iphone/140909

Predicting the Future With Social Media

Sitaram Asur and Bernardo A. Huberman at the Social Computing Lab at HP Labs in Palo Alto, California, have demonstrated how social media content can be used to predict real-world outcomes. They used content from Twitter.com to forecast box-office revenues for movies. With a simple model built from the rate at which tweets are created about particular topics, they outperformed market-based predictors. They extracted 2.89 million tweets referring to 24 different movies released over a period of three months. According to the  researchers’ prediction, the movie ”The Crazies” was going to generate 16,8 million dollars in ticket sales during its first weekend.  The true number showed to be very close –  16,06 million dollars. The drama ”Dear John” generated 30,46 million dollars worth of tickets sold, compared to a prediction of 30,71 million dollars.

Reported by British BBC: http://news.bbc.co.uk/2/hi/8612292.stm

Reported by SiliconValleyWatcher: http://www.siliconvalleywatcher.com/mt/archives/2010/04/twitter_study_i.php

The research report: http://www.hpl.hp.com/research/scl/papers/socialmedia/socialmedia.pdf

Previous related iOSINT posts:

https://iosint.wordpress.com/2010/03/29/ted-com-sean-gourley-on-the-mathematics-of-war/

https://iosint.wordpress.com/2010/03/17/social-media-intelligence-output/

Take a look at what they want to hide

Web site owners can block search engine web spiders and indexation bots from including parts of the content under their domain in the search engine index. That is done by placing a text file named robots.txt in the root directory of the web site. The text file will contain instructions such as “Disallow:” followed by a subdirectory or a web page, which will tell the bots that this part of the site should not be included in the search engine index. As a result, that page or the pages in that subdirectory will not be available among the results from any search.

Of course, none of these pages are truly protected or hidden – they are just not included in the search engines’ lists of “known web pages”. So, anyone knowing the exact web address will be able to browse to the page in question and view it.

In most cases, the web site owner is not really trying to prevent access to anything on his or her site. More likely, the purpose is to omit certain content from the search results, in order to give the more relevant content better visibility. Then, once a visitor is on the web site, everything published on the web site is available through the navigation menus and internal links.

However: there are cases where the site owner has published something to the web server, which is not made part of the public web site, and he or she is trying to hide this content by blocking search engine spiders from indexing that content. The problem is that robots.txt is always a file that is openly available to anyone – otherwise web spiders would not be able to read it. So, whatever anyone is trying to hide from search engines is listed in plain text, right there in robots.txt.

So if you are interested in finding out what some web site owner is hiding from search engines, and in turn ponder over why that might be, just look for the robots.txt file and read it. The file can also contain interesting comments, providing clues to why certains content has been disallowed. If a robots.txt file is in use, it will be found in the root of the site, for example: http://www.google.com/robots.txt

If you want to google for robots.txt files in general, use this query in Google:

ext:txt inurl:robots
(Try it here)

If you want to google for a robots.txt file on a particular domain, use this query in Google:

ext:txt inurl:robots site:yourselecteddomain.com

Here, for example, is the robots.txt file for Microsoft.com:
http://www.google.com/search?q=ext:txt+inurl:robots+site:www.microsoft.com

Apparently, Microsoft don’t want people to find the help pages for MacIntosh owners using Microsoft products when searching…

Read more about robots.txt on Wikipedia: http://en.wikipedia.org/wiki/Robots.txt

Posted in Websites. Tags: , . 1 Comment »