Well, I would like to start with a disclaimer first: I did NOT discover a new type of malware or virus, this is a blog talking about some of type of users on our social networks.

So, what are these zombie fans/followers/friends(FFFs)? (Oh, please bear with me, this will take a while):

They are users on a social network who stay inactive, or post information which are very commercialized, or replies to a topic which are totally irreverent. There are some significant characteristics:

  1. the number of people they follow is way more than the number of people who follow back on them.
  2. most of their posts are commercialized, i.e. promoting certain products, or events.
  3. constantly “active” on their social networks, by “active”, it means that they post stuff on a daily basis, at least.

These are the main, and important characteristics to identify them, it’s not hard to spot them in your list of followers, especially on Twitter, and Sina Weibo.

Now, we know who they are, and let’s find out what they want.

Just like the good old day malware, spying in your computer, popping up random webpages, once you click those pages, the creator of that malware gets paid. Same idea for these new age zombies, they follow you, and hope that you would follow back on them, so that their status updates would show up on your timeline, or home. Most people won’t follow back, but however, based on the large number of people they follow, there is a good amount of people aren’t aware of what these zombies really want, and be kind to follow back. That serves the zombies purpose.

Or even one doesn’t follow back on the zombies, they can still post replies with embedded spam to your topic, and I won’t tell you that how easily it can be done by scripting (thanks to all the excellent API developers out there).

We have all seen those spams before, we just choose to ignore them, it’s human nature. However, the moment we think we really ignored them, but we didn’t, they sort of stayed in hour head, the next time we see them, we remember we have seen them before. And the cycle goes on and on. And that, my friends, is exactly what the spammers want.

Another purpose I figured out the meaning of those zombies is, to make someone’s reputation look good. Imagine this, 2 complete strangers added you as a friend, and one is with 500k followers already, and the other one is with only 50 followers. Which one of them looks like a spammer to you? Of course, you would say you don’t have to friend with any of them, but the truth is, you can’t deny that you would pick the one with 500k, if you had to pick.

And you would probably ask now, why don’t we develop a program to filter those spams? Well, I believe they did. But these zombie spammers get smarter. For example, not all of the stuff they post are spams, or, they post something that’s sort of related to the topic, but with some other non-related information in the same reply as well, just for the sake of confusing the spam filters.

Filtration is always a hard topic, no matter in research, academy and industry. It’s about AI, natural language parsing and across many other topics.

Back in 2007, when the “dumb” phones were still dominating the market, the smartphone OSes started rising up as competitors, including iOS and Windows Mobile. Articles, blogs and news were all taking their sides. Everyone started getting its supporters and followers. And people were starting to become competitors. People talked about how great their iPhone/iPod were, and some other talked about their Windows phones or some other phones. The companies appeared as competitors at that time. Well, they still are.

However, few years later, looking backward now, I really don’t think they were real competitors. I think they were more allies than competitors. Why? Because they were pushing a new set of technologies to the market, they were both trying to change the consumer behaviour, hence the “dumb” phones vs the “smart” phones. And then, Google bought in the company who developed Android, and joined the game. With the amount of cash Google had at that time, it wasn’t hard to get them catch up in the game, and helped the other “competitors” to kick the “dumb” phones out of the market. Also, something I have to mention is that, I know a lot of telecommunication companies in the North America boosted up the bouns on smartphone sales commissions, give those sales and agents some credit to this whole smartphone era!

Things started to change then. The traditional companies started to fall, except Samsung, who adopted the technologies in a very early stage, and hooked up with Google, now it’s really a competitor of Apple and Microsoft now, although they all have businesses involved with each other.

My point is, the competitors now, weren’t really competitors back then when they started in the market. They were more like allies, trying to prompt this “smart” phone concept to the people world-wide. We all know that geting people to adapt new things is difficult, especially when it comes to the little handheld devices they use everyday. In more detail of this, read on Chapter 19.4 in Jon Kleinberg and Eva Tordos’s book “Networks, Crowds, and Markets”.

Screen Shot 2013-06-11 at 1.56.58 AM

The whole idea is, to get enough people to switch to “smart” phones, and when there were enough to get to the threshold, people will start switching more quickly. It proved. Perfect example, just like how people switched to Blackberry because all their friends were on BBM… and people switched out because no friends were on it anymore. Same stories go on for mySpace, and some other SNS sites.

Also, here is a market share trend for the smartphones starting from 2007 upto now. (source: wikiepedia.org, Mobile Operating Systems)

World_Wide_Smartphone_Sales_Share

When the world is so connected, and data is everywhere and getting bigger and bigger every second, we need a way to retrieve data and manipulate it, efficiently.

The very first question is that, where can we get this data? Well, that replies on the application developers on the other end to decide. For example, if one wants to get the data about all the international trades happened in 2012 for Canada, then that’s up to the Canadian statistics to provide some sort of portal for the data, in other words, APIs.

I am lucky, I live in a city where they care enough about sharing data. Toronto has public APIs (Open Data) for programmers to play with. Also, New York City has a smilar portal for sharing data as well.

Recently, I started to be active on the Chinese micro blog, called weibo, and the Zhu Ling case is super popular on it. It’s so popular that the Chinese citizens started a petition on “We The People” requesting to deport Jasmine Sun, who was the only suspect at that time. I wasn’t very interested in this case, but the petition website itself.

Later I found out that WTP provides APIs. So I wrote a small program to get the signatures from the Zhu Ling petition, in order to create a graph to show the tend.

The python code itself is very self-explanatory, basically what it does is to first initialize the sqlite3 database connection, then try open the status file (a file to keep track of the work that has been done so far), then continue to call up the API with the current offset, and also with the 1000 range of signatures.

Then it continues with the results, does a little bit of parsing, and push the records into the database, then wait a bit, then request again, and so on.

 


#!/usr/bin/env python

import urllib2
import json
import datetime
import time
import sqlite3
import pickle
db = sqlite3.connect("zl.db")
c = db.cursor()
waiting = 0
status = None

try:
 status_file = open("status", "rb")
 status = pickle.load(status_file)
 status_file.close()
except:
 status = {"current_offset":0}
 status_file = open("status", "wb")
 pickle.dump(status, status_file)
 status_file.close()

current_offset = int(status['current_offset'])

def get_json(offset):
 url = "https://api.whitehouse.gov/v1/petitions/5183ce9eeab72a2c0c000002/signatures.json?limit=1000&offset=" + str(offset)
 req = urllib2.Request(url)
 opener = urllib2.build_opener()
 f = opener.open(req)
 j = json.loads(f.read())
 return j

while True:
 result = get_json(current_offset)
 meta = result['metadata']
 success = True if meta['responseInfo']['status'] == 200 else False
 current_count = int(meta['resultset']['count'])
 current_offset = int(meta['resultset']['offset'])
 entry = result['results']
 if success and len(entry) > 0 and current_offset < current_count:
 for signiture in entry:
 created_time = datetime.datetime.fromtimestamp(int(signiture['created'])).strftime('%Y-%m-%d %H:%M:%S')
 id = signiture['id']
 zip = signiture['zip']
 name = signiture['name']
 sql = "INSERT OR IGNORE INTO `signitures` (`id`, `name`, `zip`, `created_time`) VALUES (\'%s\', \'%s\', \'%s\', \'%s\')" % (id, name.replace("'",'"'), zip.replace("'",'"'), created_time.replace("'",'"'))
 try:
 c.execute(sql)
 except:
 print sql

current_offset = current_offset + len(entry)
 print "pushed ", len(entry), " into the resultSet. variables now are: ", current_offset, waiting
 waiting = 0 if 1000 / len(entry) == 1 else 1000 /len(entry)
 if(waiting == 60):
 break
 else:
 time.sleep(waiting)

status['current_offset'] = current_offset

status_file = open("status", "wb")
pickle.dump(status, status_file)
status_file.close()

db.close()

and with highcharts API, I can easily populate the follow graph

zhuling_chart

</p>
&nbsp;

<html>
<head>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script>
<script type="text/javascript" src="http://www.highcharts.com/highslide/highslide-full.min.js"></script>
<script type="text/javascript" src="http://www.highcharts.com/highslide/highslide.config.js" charset="utf-8"></script>
<link rel="stylesheet" type="text/css" href="http://www.highcharts.com/highslide/highslide.css" />

<!--[if lt IE 7]>
<link rel="stylesheet" type="text/css" href="http://www.highcharts.com/highslide/highslide-ie6.css" />
<![endif]-->
<!-- End Highslide code -->

<script type="text/javascript">
var example = 'line-basic',
 theme = 'default';
</script>

<script type="text/javascript" src="http://www.highcharts.com/demo/scripts.js"></script>
<link rel="stylesheet" href="http://www.highcharts.com/templates/yoo_symphony/css/template.css" type="text/css" />
<link rel="stylesheet" href="http://www.highcharts.com/templates/yoo_symphony/css/variations/brown.css" type="text/css" />
<link href="http://www.highcharts.com/demo/demo.css" rel="stylesheet" type="text/css" />
<script type="text/javascript">
$(function () {
 $('#container').highcharts({
 chart: {
 type: 'line',
 marginRight: 130,
 marginBottom: 25
 },
 title: {
 text: 'Zhu Ling Petition Signatures Trend (May 3rd - May 8th)',
 x: -20 //center
 },
 subtitle: {
 text: 'Source: https://petitions.whitehouse.gov',
 x: -20
 },
 xAxis: {
 categories: ['2013-05-03', '2013-05-04', '2013-05-05', '2013-05-06', '2013-05-07', '2013-05-08']
 },
 yAxis: {
 title: {
 text: 'Total number of Signatures'
 },
 plotLines: [{
 value: 0,
 width: 1,
 color: '#808080'
 }]
 },
 legend: {
 layout: 'vertical',
 align: 'right',
 verticalAlign: 'top',
 x: -10,
 y: 100,
 borderWidth: 0
 },
 series: [{
 name: 'Data',
 data: [3181, 11305, 68496, 37587, 11900, 3978]
 }]
 });
 });
</script>
<title>
zhu ling charts
</title>
</head>
<body>
 <script src="http://code.highcharts.com/highcharts.js"></script>
<script src="http://code.highcharts.com/modules/exporting.js"></script>
<div id="container"></div>
</body>
</html>

 

Happy coding!

  • Yii is a relatively new PHP web framework that follows the MVC paradigm.Yii – MVC design pattern
    source: http://www.mediainfonet.com/blog/yii-mvc-design-pattern/
  • First of all, I am not a very big fan of PHP, although I use it a lot for my projects (that’s why my friends say I am in love with it). PHP isn’t “best” language, but, what does it mean to be the “best language” anyway? Any language is just a tool, that helps developer to translate and express the logic of one’s idea. Another point to prove this is that, try posting a question on stackoverflow with something like, “what programming language is the best … ?”, I guarantee you that they will close your question in no time for being not constructive. (i.e. http://stackoverflow.com/questions/4160162/what-is-the-best-programming-language-and-framework-for-cross-platform-desktop-a)
  • Having that said, Yii is an amazing framework that can prototype one’s idea in no time, especially with its integration of Twitter’s Bootstrap, a UI framework by Twitter.
  • Take a look at this site, it’s Yii with Bootstrap. http://www.cniska.net/yii-bootstrap/
  • The standard Yii comes with a basic user/admin management, and full access control for controllers and actions. What this means to developers is that, for applications that require access controller, you can easily implement it with the generated code by Gii, which is a tool that can generate Models, Controllers and Views based on your database design.
    Yii user management
    - access control: http://www.yiiframework.com/wiki/341/simple-access-control/

    class PostController extends CController
    {
        ......
        public function accessRules()
        {
            return array(
                array('deny',
                    'actions'=>array('create', 'edit'),
                    'users'=>array('?'),
                ),
                array('allow',
                    'actions'=>array('delete'),
                    'roles'=>array('admin'),
                ),
                array('deny',
                    'actions'=>array('delete'),
                    'users'=>array('*'),
                ),
            );
        }
    }
  • The great power of Gii: Automatic Code Generation

    source: http://tommasodargenio.com/tutorial-8-easy-steps-to-create-a-web-application-with-yii-part-3-144.htm

See a problem, formulate the problem, find possible solutions, implement the solution, carry out the solution, push the solution to the market.

Every time we go on Amazon, eBay, Taobao and even Canadiantires, they recommend you things related to the products you’ve viewed, or someone else has viewed after viewing one or another.  It all makes sense when they try to sell more based on you and other customers do. It’s a machine learning behaviour.

Say, I am looking at a Canon camera now on Amazon, now. On the bottom of the page, it recommends me some SDCARD products. It makes perfect sense, right? If I buy a digital camera, I might need a new storage card as well. OK, now this gets me thinking about the current recommendation system model. They look at users as a category, they don’t look at them as real persons. The reason for that is, they don’t know the users well enough to extend their knowledge to predict more. For example, they can only recommend you products that are in the same category of the ones you are viewing right now. But they can’t recommend you something that’s totally unrelated, like you are viewing a camera, but it’s very rare to see them recommending your some brand of chocolate. Because they are not related in a common sense.

If, we can some how build a graph of products, and instead of commending the first degree of products, we recommend things in the second-degree circle, or even further nodes.

Now you might say Facebook already published their graph search, which is somewhat related to the problem I am describing now. Facebook’s graph search is more like getting recommendations from friends, which is I consider they are strong ties. But the power of weak ties sometimes bring in more powerful effect to the model. [1][2]

Those popular online shopping sites have already done some data mining based on the products customers normally buy together. How about we perform another layer of data mining on that? The other things that people might be interested in, not necessary in the same category? Then we carry out the weak ties strength, and see what they can bring. A system that is smart enough to look at customers not just a profile, but real persons with knowing your confidential information. (Note that this is not like the Google ads that based on your searching/viewing patterns, I might come back to that.)

[1] “The Strength of Weak Ties” by Mark S. Granovetter.

[2] “Chapter 3, Strong Ties and Weak Ties” in “Networks, Crowds, and Markets: Reasoning about a Highly Connected World” by David Easley and Jon Kleinberg.