Monday, November 11, 2013

Information Analysis #1

Category: Cell phones

Information Analysis is the study of information and what is represents.  You might also know it under the terms of data mining, information gathering, intelligence gathering, or intelligence assessment, just to name a few.

This blog entry focuses on a method for un-anonymizing celluar usage based on an actors tendency to use multiple devices for legitimate vs. illegitimate activities.

Scenario: The primary actor (PA) runs a legitimate business and uses cell phone 1 (CP1) for daily communication of business.  This phone is registered to PA.  This takes place in multiple locations through the city where PA lives.  PA also runs an illegal money laundering business and uses his legitimate business to cover for it.  In order to keep the two separate, he uses a prepaid cell phone (CP2) for all money laundering business.  The assumption is that by using a prepaid cell phone, it will not be able to be linked back to him.

Task: Given that we suspect the PA is conducting illegal activities, we want to be able to tie those activities back to PA2.

Solution: The solution lies in the use of cellular towers and the logging of information related to phone calls.  Assuming that cell phone activity for a given tower or set of towers can be obtained, a cross reference algorithm is devised which will link the activity of the PA between cell towers and phones.

For example, during any particular day, the PA makes calls on CP1 which is covered by cell sites 449, 2132, and 474.  Since we know that CP1 is registered to the PA, we can track these activities:

Date/Time Cell Site Originating Number Destination Number Duration (minutes)
2013-11-10 08:02:33 449 802-310-1234 817-467-3311 5
2013-11-10 08:32:18 2132 802-310-1234 802-846-2111 5
2013-11-10 10:12:12 2132 802-310-1234 202-233-3232 5
2013-11-10 13:11:22 474 802-310-1234 802-355-2314 5
2013-11-10 17:30:01 449 802-310-1234 603-453-1234 5

Now to identify the CP2 usage, we focus on the cell sites that have been used by the PA during this day.  If we plot out the cell site usage, time, and location we can create a probability map of activity for the PA with the probability that they are within range of a certain cell site at any given time.  This is based solely on a linear distribution of probability between any two sites based on the progression of time, and the straight-line connectivity between the cell sites.

Once we have the probability map defined, then we can take a look at all of the other cell phone calls from those cell sites during the time period and prescribe a probability that one of those phones is CP2.

What the algorithm takes into consideration is based on the probability of time that the PA is within access to any one cell site, the phone calls from cell numbers to that site and that time will be assigned a probability.  If we look at the summed probabilities of the cell numbers across all the sites, then we can establish which other cell phones have a high probability of being CP2.

Once CP2 has been identified, a wiretap order can be executed to obtain the information required for prosecution.

Conclusion: This algorithm works best under the following scenarios:
  • The PA travels around enough to use multiple cell sites for both CP1 and CP2
  • The PA uses both cell phones multiple times during the day
  • The PA does not have both cell phones on all the time during the day
Of course, the alternative if both cell phones are on all the time during the day is to just cross-correlate registration of the cells phones with the local towers.  Any two that register at or near the same time have a high probability of being with the PA.

While not a 100% solution, this algorithm provides a high probability of locating multiple linked cell phones for specific scenarios.