2. Collection | iOSINT - information on Open Source Intelligence

Make your sources talk: elicitation, motivation, provocation… investigative journalists do it too

2012-03-26 — iosint

If you understand Swedish, you must listen to this presentation titled “The ABC of investigative journalism”, by Nils Hanson from Swedish national television (SVT). It was made during the 2012 seminar on the topic of investigative journalism held in Malmö, Sweden, during the week-end of March 23-25. This was the 16:th time Nils Hanson made this presentation.

The interesting thing here is that Nils Hanson represents the community of investigative journalists and reporters, who think of them selves as being among “the good guys”, revealing the truth to the public, uncovering what corrupt politicians hide and even sometimes shedding light on dodgy activities of government intelligence and security organizations.

However, when listening to Nils Hanson, you will hear him describe to his audience of journalists how they should go about in order to make an unwilling human source talk, how they should go about in order to make an unwilling private person agree to becoming the subject of a news story and so on.

If you have government or military intelligence training in the field of HUMINT, you will immediately notice that the methods recommended by Nils Hanson are spot-on similar to the methods used by government and military intelligence operators. The key words are elicitation, motivation, provocation, flattery, favors and favors in return and so on:

– Build trust and rapport by starting out talking about something irrelevant non-sensitive and/or slightly humouristic
– Reduce tension in a situation where the source is refusing to talk by asking for something trivial like a cigarette, and then a match and so on
– Motivate the source to talk by providing gifts without asking for anything in return and by making considerable and noticeable efforts. This will build confidence, and also a sense of indebtedness.
– When a source is refusing to be the subject of a news story or refusing to being interviewed in television, tell the source that full control is with him/her, and start moving in small steps while telling the source that he/she can back out at any time. Having committed to a recorded interview, where several people spent a lot of time, the source will seldom back out and tell them all that their efforts and work have been for nothing.

All of these methods push well-known and simple psychological buttons and leverage mechanisms of human nature such as our reluctance to jump of the band wagon once we have been on it for a while. Normal people have a strong inner voice that talks about commitment, promise, responsibility, duty, gratitude, debt, payback, fairness etc.

I am sure not many of the journalists at the Gräv 2012 seminar would feel comfortable to think of them selves as working with the same toolbox as an intelligence officer managing his human assets.

http://bambuser.com/v/2494983

Posted in People, Uncategorized. Tags: humint. Leave a Comment »

Flickr picture uploads from 24 hours in print – HUGE amounts of photos

2011-11-18 — iosint

British Creative Review writes about an installation by Erik Kessels on display at Foam in Amsterdam. Kessels has printed out the amount of photos that are uploaded to Flickr during 24 hours, allegedly 1 million photos.

Creative Review writes: <<“We’re exposed to an overload of images nowadays,” says Kessels. “This glut is in large part the result of image-sharing sites like Flickr, networking sites like Facebook, and picture-based search engines. Their content mingles public and private, with the very personal being openly and un-selfconsciously displayed […] >>

What is most interesting about Kessels installation is that it turns this abstract number into something very concrete, that you can relate to physically: several rooms with piles of photos covering floor and walls. That gets a different message though compared to the old million-billion-trillion rant.

Speaking of which, here are some more interesting figures about photos on the internet:

5 billion – Photos hosted by Flickr (September 2010).
3000+ – Photos uploaded per minute to Flickr.
130 million – At the above rate, the number of photos uploaded per month to Flickr.
3+ billion – Photos uploaded per month to Facebook.
36 billion – At the current rate, the number of photos uploaded to Facebook per year.

(Source: http://royal.pingdom.com/2011/01/12/internet-2010-in-numbers/ )

The article in Creative Review, with photos (!) showing the massive amounts of photo print-outs in the installation:

http://www.creativereview.co.uk/cr-blog/2011/november/24-hours-in-photos

This installation by Erik Kessels is on show as part of an exhibition at Foam in Amsterdam that looks at the future of photography. It features print-outs of all the images uploaded to Flickr in a 24-hour period…

Posted in Images. Leave a Comment »

Videos of presentations at DerbyCon 2011 – a must-see for anyone in information security or intelligence

2011-11-07 — iosint

During the weekend of September 30 – October 2, the DerbyCon took place in Louisville, Kentucky, at the Hyatt Regency hotel. During those three days, a number of extremely skilled and knowledgeable speakers presented on different topics in three parallel tracks. All the presentations were video recorded and are now available online.

There is a very high likelihood that you will learn valuable things from watching these videos, either from an information security standpoint, or from an open source intelligence standpoint.

http://www.irongeek.com/i.php?page=videos/derbycon1/mainlist
http://www.derbycon.com/

Posted in 2. Collection, Counter Intelligence, Infomation Security, Tools, Videos. Tags: how-to, Videos. Leave a Comment »

Use Google to search. No really.

2011-10-17 — iosint

Over 60 percent of searches include only three words or less.

In over 80% of all search queries made, less than five words are used.

According to the Internet monitoring company Hitwise, the distribution of number of words used in search queries looked like this by January 2009. The statistics cover searches made, not people doing searches, which is important. Still, in over 80% of searches, less than 5 words are used in the search query. The most common search query length is 2 words.

Now, we all make a lot of searches, and in some cases we have learnt that typing one or two specific words will give us the site we are looking for on the first page of search hits. Also, a lot of people have learnt that putting a single word XYZ directly in the address field of the browser will take them to http://www.XYZ.com. Doing the same thing in Google Chrome will deliver a search on that word. People are a combination of lazy, practical, and smart – they quickly learn what works and repeats that. When they search for something for real, the more word-rich queries come in to use.

For anyone looking for something specific on the web, the chances of finding it increase if you have the knowledge to utilize the full power of the search engines. Think of it as shifting from first gear of search engine usage to second, third, fourth and fifth.

There are two dimensions to this:

1) Make sure you use the most appropriate and adequate search engine for the information you need to find. This is treated upon further in a separate article.
2) Make sure you know how to tell the search engine exactly what you are looking for, instead of throwing in a bunch of keywords in random order. This is all about making use of search operators and special characters which allows you to specify alot more complex conditions than “I want to see pages that contain these words”.

Searching more effectively with Google

There is a reason why Google is the dominating search engine: they index more, they index quicker, and they are good at understanding what results people are interested in. In fact, Google’s ability to index web content in combination with the powerful search operators specific to Google, has given birth to Google Hacking. Google hacking has nothing to do with breaching Google security. It is about using advanced searching with Google as part of the research and reconnaissance phase of a network system penetration attempt for the purpose of a) spotting targets or b) finding possible points of attack against a target.

Apart from leveraging the advanced search operators of Google in the hunt for exploit opportunities, you will of course benefit greatly in your search for information from being skilled at pushing the right buttons of the Google search engine.

Below is a list which cover what you need to know in order to make Google do a better job for you when searching. Roughly, these search operators can be put in three groups:

1) those that say what to search for, i.e. what words and numbers to match, and

2) those that say where to search, i.e. operators that limit the scope of the search or specify where the match should be,

3) those that are specialized information lookup operators, which make Google return results of a certain kind only

Operators that say WHAT to search for

1) “What” operators	Result / Effect / Meaning
secret information	Will find content that contain each of the words anywhere in the text, but not necessarily side by side
“secret information”	Will match the exact phrase and word order
~secret	Includes synonyms, alternative spellings and words with adjacent meaning
secret information OR intelligence	Will find content that contain the word secret plus either one of the words information and intelligence
intelligence -information	Will find content that contains the word intelligence while not containing the word information
intelligence +secret	Will search for content that contains intelligence and secret, with secret as required content
intelligence-community	Will find content where the two words exist separated, or written as one piece, or hyphenated

“central * agency”	The * character serves as wild card for one or more words Note! When the * character is used between two numbers in a search with no letters, it will function as a multiplication operator, returning the mathematical result multiplying the two numbers.
“US” “gov”	Google automatically includes synonyms and full-word versions of abbreviations. Putting each term in quotes assures that the search is made for exactly those terms.
“coup d’etat” 1945..1969	Will find pages that contain any number in the range 1945-1969 and the phrase “coup d’etat”

Operators that say WHERE to find a match between search term and content

2) “Where” operators	Result / Effect / Meaning
define:	Will look for the search term in word list, dictionary and glossary type of pages, e.g. define:secret
define	Alternative syntax for define:. Will look for the search term in word list, dictionary and glossary type of pages, e.g. define secret
intelligence ~glossary	Will find the word intelligence on pages that are of a glossary or dictionary or encyclopedia type
site:	Will limit the search to include only the internet domain specified, which can be a top domain, a main domain, a sub domain and so on. Examples:site:mil (combine several with the OR operator between them: site:mil OR site:gov) site:groups.google.com
inurl:	Will limit the search to only look for the search terms in the page URL. This example will show results where either one or both of wiki and sigint are part of the URL:inurl:wiki sigint
allinurl:	Very similar to inurl: but with the difference that all of the words specified must be found in the URL.
intitle:	Will limit the search to only look for the search terms in the title of pages. Title in this context means the web document HTML title, which is what you see written in the browser tab or browser window top frame.
allintitle:	Very similar to intitle: but with the difference that all of the words specified must be found in the page title.
inanchor:	Will limit the search to only look for the search terms in the anchor text of hyperlinks on pages. The anchor text is the text that was turned into a link to some page by the page creator. The anchor text may reveal something about what the page creator thinks about the page linked to, for example “Useful information on security”.
allinanchor:	Very similar to inanchor: but with the difference that all of the words specified must be found in the anchor text.
intext:	Will limit the results to include only pages where the search term was found in the text of the page.
allintext:	Very similar to intext: but with the difference that all of the words specified must be found in the text of the page.
filetype:	Will limit the results to include only files with the file extension specified, e.g. filetype:pdf to get only PDF documents
ext:	Short-hand version of filetype: that provides the exact same result
cache:	Will show the Google cache version of a web site if available, e.g. cache:cia.gov Note! This cannot be combined with additional search terms or operators
related:	Will show pages that have something in common with or are related to the site you specify, e.g. related:cia.gov Note! This cannot be combined with additional search terms or operators
link:	Will show pages that contain a link pointing to the URL you specify, e.g. link:www.cia.gov/library

Special search operators – valid only on specific Google sites

3) Special operators	Result / Effect / Meaning
location:	news.google.com – presents news search results related to the location, e.g. location:kabul
source:	news.google.com – presents news search results from the source specified, e.g. source:times
author:	groups.google.com – presents posts written by the author specified, e.g. author:einstein
group:	groups.google.com – presents posts made in the group specified, e.g. group:publicintel

When looking for information where you only have a vague idea what you should search for, only have parts of a name or only an approximate date range, advanced queries combining several such bits and pieces, involving both the OR operator, phrase quotes, and the * wild card will let you cover all bases and perform one single search that returns all possible matches.

Here are a few interesting examples that apply several of the operators listed above.

PDF-files published by FBI that talk about interrogation, methods, and deception:

site:fbi.gov ext:pdf +interrogation +methods +deception

We pages under the .mil top domain where the page title contains the word “staff”, and the page contains a link with the word “login”, excluding PDF-files as well as word documents:

site:mil intitle:staff inanchor:login -ext:pdf -ext:doc

Excel files published with the word “internal” as part of the URL, with the phrase “internal use only” in the file:

inurl:internal ext:xls OR ext:xlsx “internal use only”

Learn more about how to search with Google:

http://www.googleguide.com

http://www.google.com/support/websearch/bin/answer.py?hl=en&answer=136861&rd=1

Posted in Search engines, Tools. Leave a Comment »

Free tools for turning search hit pages into RSS feeds

2010-09-28 — iosint

When working with environmental scanning, competitive intelligence scanning, industry monitoring, corporate reputation monitoring or any similar activity, many people use a feed reader and organize feeds on their topics and keywords of interest. A wide choice of feed readers exist, and should I mention just one, that is probably NetVibes.com. NetVibes.com is one step ahead since this (free) service allows you to organize feeds and many other types of content in a collection of tabs of your own design – all of it kept online for access and use from anywhere.

While this approach to scanning, monitoring and collecting is working fine for many people, a problem shows up when you want to monitor search results from some search engine or directory which does not provide the results as a feed of any kind. For example, this is the case when you do a regular web seach with Google: the resuts cannot be obtained formatted as an RSS or Atom feed. So, if you are monitoring PDF documents issued by the US government or US military about piracy in the Gulf of Aden, using the following query: "Gulf of Aden" piracy ext:pdf site:(.mil OR .gov), then you cannot get those search results as a feed from Google.

The solution is to use one of a number of free services that formats any web page into a feed, making it readable and presentable by any feed reader. Raju, the owner and editor of TechPP, made a list in April 2009 of what he considers to be the top 10 services of this kind: Top 10 Free Tools to Create RSS for Any Website.

The services listed by TechPP are:

Feedity.com (not a free service)

Posted in Tools. Leave a Comment »

Face recognition software is pervasive and free

2010-06-17 — iosint

http://face.com/

2010-05-03:
7 Billion Scanned Photos Later, Face.com Opens Up To Developers
http://techcrunch.com/2010/05/03/7-billion-scanned-photos-later-face-com-opens-up-to-developers/

2010-05-03:
Face.com opens its face recognition tech to devs
http://news.cnet.com/8301-27076_3-20003936-248.html

2010-06-11:
The Future of Privacy: Facial Recognition, Public Facts, and 300 Million Little Brothers
http://volokh.com/2010/06/11/the-future-of-privacy-facial-recognition-public-facts-and-300-million-little-brothers/

2010-06-16:
Police facial recognition comes to the iPhone
http://www.itworldcanada.com/news/police-facial-recognition-comes-to-the-iphone/140909

Posted in Images, Photos. Leave a Comment »

Document metadata can reveal secrets

2010-04-15 — iosint

Microsoft Office documents (.doc) may contain hidden data, or meta-data, that includes user information, such as the user’s name and initials, company name, the name of the computer, the path to the location where the document was saved, the names of previous document authors, hidden text or cells, comments, and other information about the document itself. The document owner may not be, and usually isn’t, aware that the electronic form of the document contains anything more than what he/she has written and can see on the computer screen. This could lead to the inadvertent disclosure of sensitive or proprietary data when the electronic versions of these documents are shared with others.

Doc Scrubber is a free application from JavaCool Software that lets you “scrub” word documents for data such as the user name of the document creator, the date and time of creation, the name of other users who edited the document, time of editing and more. Doc Scrubber supports Word documents in the classic .DOC format (the default format in Word 97, 2000, XP, 2003).

The value of this is that you can verify that the alleged writer of a document actually is the sender, and you can check to see if other people were involved in the production of the document. In rare cases, you can get an indication of the writers sentiment about the topic, if for example the original file location path contains an expressive folder name.

There are famous true cases where foul play has been revealed by reviewing word document metadata. Read more in this article (from 2006): http://news.cnet.com/Editing-tips-from-the-NSA/2100-1029_3-6030745.html

This whitepaper from SANS Institute (SANS.org) by Larry Pesce gives good information on metadata in several different kinds of electronic files as a source of information:

“This paper will illustrate ways in which metadata stored in common types of documents can reveal secrets about an organization”

http://www.sans.org/reading_room/whitepapers/privacy/document-metadata-silent-killer_32974
(NOTE: The above link leads to a file that you must save to disk. Then, add the file extention .pdf and you can open it with your PDF-reader software).

Take a look at what they want to hide

2010-04-09 — iosint

Web site owners can block search engine web spiders and indexation bots from including parts of the content under their domain in the search engine index. That is done by placing a text file named robots.txt in the root directory of the web site. The text file will contain instructions such as “Disallow:” followed by a subdirectory or a web page, which will tell the bots that this part of the site should not be included in the search engine index. As a result, that page or the pages in that subdirectory will not be available among the results from any search.

Of course, none of these pages are truly protected or hidden – they are just not included in the search engines’ lists of “known web pages”. So, anyone knowing the exact web address will be able to browse to the page in question and view it.

In most cases, the web site owner is not really trying to prevent access to anything on his or her site. More likely, the purpose is to omit certain content from the search results, in order to give the more relevant content better visibility. Then, once a visitor is on the web site, everything published on the web site is available through the navigation menus and internal links.

However: there are cases where the site owner has published something to the web server, which is not made part of the public web site, and he or she is trying to hide this content by blocking search engine spiders from indexing that content. The problem is that robots.txt is always a file that is openly available to anyone – otherwise web spiders would not be able to read it. So, whatever anyone is trying to hide from search engines is listed in plain text, right there in robots.txt.

So if you are interested in finding out what some web site owner is hiding from search engines, and in turn ponder over why that might be, just look for the robots.txt file and read it. The file can also contain interesting comments, providing clues to why certains content has been disallowed. If a robots.txt file is in use, it will be found in the root of the site, for example: http://www.google.com/robots.txt

If you want to google for robots.txt files in general, use this query in Google:

ext:txt inurl:robots
(Try it here)

If you want to google for a robots.txt file on a particular domain, use this query in Google:

ext:txt inurl:robots site:yourselecteddomain.com

Here, for example, is the robots.txt file for Microsoft.com:
http://www.google.com/search?q=ext:txt+inurl:robots+site:www.microsoft.com

Apparently, Microsoft don’t want people to find the help pages for MacIntosh owners using Microsoft products when searching…

Read more about robots.txt on Wikipedia: http://en.wikipedia.org/wiki/Robots.txt

Posted in Websites. Tags: Collection, Web. 1 Comment »

An image file says more than a picture

2010-03-31 — iosint

Collecting an image file from the internet provides a couple of obvious pieces of information:

The objects and/or persons shown in the picture (which can be identified or unidentified)
The URL from where the image file was collected

In addition to this, there may be metadata available in the web page where the image file was embedded, that could tell you things like:

the name of the photographer
the geographic location where the photo was taken
the time and date of the occasion
the reason for taking the photo
the purpose of publishing the photo
the message communicated along with the photo (which could be political propaganda, commercial marketing etc)

Not many people are aware of the additional information that can be read from an image file thanks to metadata being saved in the file automatically. Embedding metadata in the actual image file avoids the risk of having the image and the metadata separated by mistake. In common speech, such data is referred to as EXIF data and IPTC data, since the data is stored in the image file in accordance with specifications called EXIF, Exchangeable Image File format, and IPTC IIM, International Press Telecommunications Council Information Interchange Model. In addition to these, there is a newer standard called XMP, Extensible Metadata Platform, created by Adobe Systems Inc. Serialized XMP can be embedded into several kinds of files, while also maintaining their readability by non-XMP-aware applications. XMP information is typically included alongside EXIF and IPTC IIM data.

The list of information types that an image file can contain in its embedded metadata is in theory endless, since software and hardware manufacturers are free to define their own “XMP tags”. This is a central part of the idea with XMP, hence the name “extensible”. Among the more exciting types of information that can be added to digital photos by cameras are:

GPS coordinates for the geographic location where the picture was taken (requires GPS function in camera)
Camera temperature (which should be close to the surrounding air temperature in most cases)
Camera make and model, as well as firmware version (which can be important info in a forensics setting)
Degree of zoom used, which provides a hint about the distance from the object

An image that has been edited using some software will reveal the name and version number of that software product in the embedded metadata, as well as the date and time of the last change.

While alot of information is added to image files automatically, by cameras and software, it can also contain embedded information such as keywords and description, that was added by someone working with the picture in some photo management software before publishing the picture on the net. For example, when using Picasa, the keywords and captions you add to picture are stored as IPTC values inside the actual image file, and follow along if the file is copied. The tragic downside of that neat feature is that the second you add the first keyword (label) to a picture using Picasa, the entire set of camera maker XMP metadata in the image file is destroyed – it disappears. I would love to hear the Picasa people at Google explain why this is so, since the labels and captions added with Picasa are stored as IPTC IIM, not as XMP, so there does not seem to be any obvious conflict. Can this be a simple bug in Picasa?

For investigating (and modifying) the metadata embedded in an image file, the preferred tool by professionals world-wide is ExifTool by Phil Harvey. It comes as a platform-independent Perl library, or as a command line application without graphic user interface. For Windows users, I therefore recommend that you use the GUI for ExifTools provided by a Slovenian guy using the alias HBx: http://freeweb.siol.net/hrastni3/foto/exif/exiftoolgui.htm By putting Phil Harvey’s exiftool.exe and HBx’s ExifToolGUI.exe in the same folder, you have an easy to use application which can read and write a very large part of the different types of image file metadata out there.

In addition to ExifTool, there are dozens and dozens of free applications that let you view and edit embedded image metadata. I will point out one of them, which I recently learned about, since it has an interface that provides good overview and also presents the tag-id values and data format type: http://www.photome.de/home.html PhotoMe is created and offered for free by Jens Duttke from Germany.

And finally, a quote from the Wikipedia article on EXIF, highlighting the OSINT potential in harvesting embedded metadata from digital images: “Since the Exif tag contains information about the photo, it can pose a privacy issue. For example, a photo taken with a GPS-enabled camera can reveal the exact location it was taken, which is undesirable in some situations. By removing the Exif tag with software such as ExifTool and Exif Tag Remover before publishing, the photographer can avoid possible problems.”

Posted in Images. Tags: Collection, EXIF, IMINT, IPTC, metadata, XMP. Leave a Comment »

« Older posts

iOSINT – information on Open Source Intelligence

iOSINT on Twitter

General information

Article categories