Tracking performed by Social Networks

In this blog post I analyze methods of user tracking which are performed by popular social network websites such as Facebook, Twitter, Xing, and recently Google+.

Each of these social networks have buttons (called Like, Tweet, Visitors, and +1 buttons) which are installed on numerous websites. I try to put some light on the actions performed by those buttons and how they track users around the web, even when they don't click those buttons.

All these buttons have one thing in common: they are embedded in websites all around the web and load resources (scripts, images, etc.) which are fetched from the social networking website or their content delivery partners. The website operator embedding these buttons does not have the complete control over what content is loaded in the context of the user's browser viewing the website.

In the next paragraphs I show some details about the code of these buttons and what happens when users view the webpage located at http://www.example.com/shop.jsp?product=4711. Let's assume that this is a popular shopping site and the URL points to the product page of a certain product (identified by the parameter in the URL).

I differentiate between the following three cases for each social network while analyzing their abilities to track users surfing the web:

  1. The user is logged in at the social network site.
  2. The user is not logged in at the social network site.
  3. The user is not participating in the social network and has therefore no account.

Facebook's Like button

Facebook's Like button
Screenshot of
Facebook's Like button

Facebook has chosen to ship the Like button as a code snippet which mainly consists of either an IFrame or some JavaScript code (called the XFBML style).

The IFrame version of the code to embed for a Like button on the above mentioned shopping site's product page looks like this:

<iframe src="http://www.facebook.com/plugins/like.php\
  ?href=http%3A%2F%2Fwww.example.com%2Fshop.jsp%3Fproduct%3D4711&\
  send=false&layout=standard&width=450&show_faces=false&action=like&\
  colorscheme=light&font&height=35"
scrolling="no" frameborder="0"
style="border:none; overflow:hidden; width:450px; height:35px;"
allowTransparency="true"></iframe>

Similar the XFBML version of the same Like button looks like the following code. The website operator has the possibility to let the page to like be dynamically determined from the script or sets it directly in the button's code.

<div id="fb-root"></div><script src="http://connect.facebook.net/en_US/all.js#xfbml=1">
</script><fb:like href="http://www.example.com/shop.jsp?product=4711" 
send="false" width="450" show_faces="false" action="like" font=""></fb:like>

From the above code samples you can see that the complete URL of the page the website operator equipped with a Like button is given as a parameter to Facebook. As both variants (IFrame and XFBML) are executed directly when viewing our example.com product page, Facebook sees that someone is viewing that page - even without the user having to click on the button. But is Facebook also able to see who is viewing the page? To answer this question we have to dig a bit deeper into the dynamic HTTP traffic exchanged while viewing the targeted page.

For both styles (IFrame and XFBML) the user's browser issues HTTP GET requests to load the IFrame source or the JavaScript source which is then executed. When capturing the HTTP traffic originating from your browser while accessing our example.com product page (assuming you're logged in at Facebook at that time) you will notice an HTTP GET request similar to the following, which is sent to facebook.com (I've masked some sensitive stuff with 'X'):

GET /plugins/like.php?\
  href=http%3A%2F%2Fwww.example.com%2Fshop.jsp%3Fproduct%3D4711\
  &send=false&layout=standard&width=450&show_faces=true\
  &action=like&colorscheme=light&font&height=80 HTTP/1.1
Host: www.facebook.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.7;\
  de; rv:1.9.2.10) Gecko/20100914 
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Referer: http://www.example.com/shop.jsp?product=4711
Cookie: datr=3efxQeXXXXXXXXXXXho7EX;\
  act=1353XXXXX29%2F3;\
  c_user=2020XXXXX7006;\
  locale=de_DE;\
  lu=RgKp7uXXXXXiep_oISg;\
  sct=19XXX36;\
  xs=60%3A40bc92XXXXXXXXXX8dc88c8;\
  x-referer=http%3A%2F%2Fwww.facebook.com%2F%23%2F;\
  wd=1280x647

To answer our question regarding the visibility of the user to Facebook while surfing other sites: Take a look at the cookies submitted to Facebook as part of the above shown HTTP GET request. The snippet contains the cookies which are sent to facebook.com on viewing the example.com product webpage while the user is logged in at Facebook in the background. Most users of social networks stay logged in for a long time and use the "keep me logged in" checkbox to get some sort of persistent login cookie. The cookies sent back include some (c_user) that can be treated as profile and/or session identifiers. So the answer is: Yes, Facebook seems to be able to identify the user which is viewing the product page in our sample. And all this happens by simply surfing to a page which has the Like button embedded and does not require the user to actually click the Like button.

What happens when the user explicitly logs out of Facebook and then visits our example.com page you might ask. In such a case the user's browser sends a few cookies less to Facebook. For example the above seen c_user cookie is missing. But nevertheless the datr cookie (which expires after two years) is still present even when logged out of Facebook. So Facebook seems to have the ability of tracking users and the pages they visit while surfing around the web even after they've logged out of Facebook.

Finally the last pending question is what happens when the user has never accessed the Facebook website (and is therefore not a Facebook user)? In that case of course no cookie is sent back to Facebook. But as soon as the user vists for example www.facebook.com such a two years valid datr cookie is set (valid for the whole facebook.com domain without any further path or subdomain restriction). This suggests that Facebook has the ability to track users that have at least one time in the past visited a Facebook website. So Facebook can keep track of users' surfing habits on Like button enabled websites even before they register an account at Facebook. This information of the past could then theoretically be linked to the individual upon registration of an account using the same browser.

For those of you who want to explore this in more detail I recommend the article by Arnold Roosendaal: Tilburg Law School Research Paper No. 03/2011, Facebook tracks and traces everyone: Like this!. This article comes to a similar conclusion and is certainly worth to read.

Google's +1 button

Googles's +1 button
Screenshot of
Google's +1 button

Google ships its +1 button as a code snippet that mainly loads JavaScript which handles the details of fetching and rendering the button. To speed up pageloading the JavaScript can be loaded asynchronously. The website operator has the possibility to let the page to plus one be dynamically determined from the script or sets it directly in the button's code:


<g:plusone count="false" href="http://www.example.com/shop.jsp?product=4711"></g:plusone>
<script type="text/javascript">
  (function() {
    var po = document.createElement('script'); 
    po.type = 'text/javascript'; po.async = true;
    po.src = 'https://apis.google.com/js/plusone.js';
    var s = document.getElementsByTagName('script')[0]; 
    s.parentNode.insertBefore(po, s);
  })();
</script>

Let's assume that the user visits our example.com product page, which has such a +1 button embedded. When the user is logged in at Google+ in the background while surfing to the product page, the following HTTP GET request is sent to plusone.google.com:

GET /u/0/_/+1/fastbutton?url=http%3A%2F%2Fwww.example.com%2F\
  shop.jsp%3Fproduct%3D4711&size=standard&count=false&\
  annotation=&hl=en-US&jsh=r%3Bgc%2F23217085-590ae8cc HTTP/1.1
Host: plusone.google.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.7;\
  de; rv:1.9.2.10) Gecko/20100914   
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection: keep-alive
Referer: http://www.example.com/shop.jsp?product=4711
Cookie: PREF=ID=5db2XXXXXX921d:TM=13XXXX775:LM=13XXXX325:S=rMwJXXXXX4TR;\
  SID=DQAAAXXXXXXXXXX3CT_d2uhBD12d2mXXXXXXXXXX3Ew2fhjw1erhXXXXXXXXXX3\
    Sl18KWUXXXXXXXXXXXXX3C4Ewfwhj-bFoXXXXXXXXXXX8U86FXXXXXXXXXm5e\
    cTsdXXXXXXXXXwfkO-5wdlkwnd2uSBXXXXXXXXXXXXXXX5vIaT_XXXXXd1t;
  HSID=A2rkXXXXXX_1nA;
  SSID=AZu12fEXXXXDAB

Similar to the request sent to Facebook for pages that have Like buttons embedded, Google+ encodes the URL of the page the user visists into the request sent to plusone.google.com. Also several cookies are sent back to Google upon visiting pages that have the +1 button embedded. Especially the SID cookie looks like a session-id and is always sent back while the user is logged in at Google+ at the background.

When the user has logged out from Google+ before visiting the example.com shop's product page a few cookies less are sent back to Google upon the visit. But the PREF cookie, which is valid for two years, is still visible to Google including an unchanging value for its ID content. This enables Google to track users visiting +1 button carrying webpages even after the users have logged out from Google+. To achieve this, Google has to map the ID from the PREF cookie to the user's profile, which is theoretically possible since both the session-id carrying SID cookie and the ID carrying PREF cookie were together visible to Google while the user was logged in.

Now imagine the user has never registered an account at Google+. In such a situation no cookie will be sent back to Google of course. But as soon as the user has visited the google.com site a two years valid cookie named PREF will be set. This cookie is then sent back on any request to view a page which has a Google +1 button on it. Finally it looks like Google is able to track users' surfing habits on +1 button enabled websites even before they register with the Google+ service. Upon creation of a profile at Google+ using the same browser this data of the past could then theoretically be linked to the individual. I find it quite interesting to see that even google.com searches and Google Maps requests (and maybe more of the google.com domain) can theoretically be tracked using the PREF cookie, since it is valid for the whole google.com domain and has no further path or subdomain restriction.

Twitter's Tweet button

Twitter's Tweet button
Screenshot of
Twitter's Tweet button

The code for a Tweet button consists mainly of a JavaScript, which is responsible for fetching the button and placing it on the page. Same as Facebook's Like button code and Google's +1 button code the Tweet button code to include can either carry the URL of the page that should be tweeted about or it can dynamically identify the current URL of the page it renders on. The following code is used to embed the Tweet button in out example.com shop's product page:

<a href="http://twitter.com/share" class="twitter-share-button" 
data-url="http://www.example.com/shop.jsp?product=4711" data-count="none" 
data-via="johnXXXXXXXdoe">Tweet</a><script type="text/javascript" 
src="http://platform.twitter.com/widgets.js"></script>

When a user surfs to the example.com page the following HTTP GET request will be sent to platform.twitter.com:

GET /widgets/images/t.gif?_=1314043621231&count=none&\
  id=twitter_tweet_button_0&lang=en&original_referer=\
  http%3A%2F%2Fwww.example.com%2Fshop.jsp%3Fproduct%3D4711\
  &text=&url=http%3A%2F%2Fwww.example.com%2Fshop.jsp%3Fproduct%3D4711\
  &via=johnXXXXXXXdoe&twttr_referrer=http%3A%2F%2Fwww.example.com%2F\
  shop.jsp%3Fproduct%3D4711&twttr_li=1&twttr_widget=1 HTTP/1.1
Host: platform.twitter.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.7;\
  de; rv:1.9.2.10) Gecko/20100914 
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection: keep-alive
Referer: http://platform.twitter.com/widgets/tweet_button.html
Cookie: k=79.193.XXX.XXX.131404XXXXX12158;\
  guest_id=v1%3A131XXXXXXXXX0080;\
  __utma=438XXX68.16XXXX4.131XXXXX42.1314XXXX42.1314XXXXX2.1;\
  __utmb=438XXX68.2.10.13XXXXX342;\
  __utmc=438XXX68;\
  __utmz=438XXX68.131XXXXX22.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);\
  secure_session=true;\
  twid=u%3DXXXXXX43%7CLqsAlXXXXXXXXXXXmoSDk%3D;\
  twll=l%3D131XXX127

Aside from the multiple inclusions of the URL of the visited page in the request sent to Twitter, a handful of cookies are sent back when the user surfing to example.com is logged in at Twitter in the background. Those cookies beginning with __utm belong to the urchin tracker monitor used by Google Analytics. It's quite interesting to see that Twitter seems to use Google Analytics. The other cookies (especially twid and guest_id) look a lot like identifiers of the Twitter account (i.e. the user surfing).

When the user surfing to the Tweet button enabled example.com site has logged out of Twitter a few cookies less are sent back. But nevertheless at least the two years valid guest_id cookie is sent back to Twitter. This looks like the same behaviour that Facebook and Google utilize for user tracking.

Finally I tested what happens when the user has no account at Twitter and therefore no cookies to send back: At this inspection Twitter showed the most interesting results: Even upon the first visit to a webpage that includes the Tweet button (and no visit to twitter.com happened before) a fresh cookie including a two years valid guest_id is sent back to Twitter. So it looks like the JavaScript mentioned above responsible for rendering the button is setting the two years valid guest_id cookie while the user visits the example.com webpage. This means that Twitter has the ability to even track surfing habits (on Tweet button enabled websites) of users that have no Twitter account and have never visited a Twitter website before. When using the same browser to create an account at Twitter afterwards this collected data of the past can theoretically be linked to the freshly created profile then. Like Facebook and Google, Twitter's guest_id cookie is valid for the whole twitter.com domain and has no further path or subdomain restriction.

Xing's Visitors widget

Xing's Visitors widget
Screenshot of
Xing's Visitors widget

This widget shows, when installed on our example.com page, how many visitors of Xing's userbase visited that page. Compared to Facebook, Google+, and Twitter, this widget's code to embed is rather thin: It only consists of an image link (to render the widget which includes the visitors counter) surrounded by a link. That's all:  
 

<a href="http://www.xing.com/de/directories/people/">
<img src="https://www.xing.com/widgets/visitor_counters/102XXX77_e1XX17?\
  label=People%20Directory" alt="People Directory" /></a>

Despite its clearness, Xing's Visitors widget needs a way to track visitors of the page/site in order to increment its counter. Let's examine how this is done: When a user surfs to the example.com page the following HTTP GET request will be sent to www.xing.com:

GET /widgets/visitor_counters/102XXX77_e1XX17?label=People%20Directory HTTP/1.1
Host: www.xing.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:6.0)\
  Gecko/20100101 Firefox/6.0
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
DNT: 1
Connection: keep-alive
Referer: http://www.example.com/shop.jsp?product=4711
Cookie: language=de;\
    _session_id=f540XXXXXXXXXXXXXXXXXXXXXXXXf159;\
    xing=|U2XXXXXkX1-uXXXXXXXXXXXXXXXXRUXm-13XXXXXXXXXXXXXXXXXXXXXXXXcUpc\
      XnXXXXg5-QXXXXXXXXXp_o9lXXXXXXXXXXXXXXXXa5ayf_jCXXXX7m_lXXS-qXX|;\
    s_cc=true;\
    s_nr=13144XXXXXX12;\
    s_lastvisit=1314469726135;\
    s_sq=%5B%5AB%5B%5C;\
    s_vi=[CS]v1|27XXXXXXXXXA314B-4001XXXXXXXX60A2[CE];\
    xing_ssl=1
If-None-Match: "7bf8ac528ddb29be45c0e2d08033c7ee"

As you can see there is only the referrer header which allows Xing to see where (on what URL carrying the widget) the user is surfing. But it's interesting to inspect the cookies sent along to xing.com when the user surfing is logged in at the social network: The non-persistent cookies _session_id and xing look a lot like session identifiers. This would (in theory) allow Xing to see who is visiting the page which includes the Visitors widget. And that's exactly what the widget is designed for: It counts (in the sense of tracking) Xing users viewing the page. So this finding came at no surprise. In order to see how deep the referrer header is used, I've tested how the counter behaves on spoofing of the referrer: It looks like Xing is assigning the counter values to the domain part of the referrer header and not to the path and/or page parts. But who knows whether the full referrer header Xing receives is saved/logged somewhere at Xing or not? There is a possible way of tracking users' surfing habits by linking the referrer URLs of the image requests to the cookie values identifying the user profile.

When the user has logged out of Xing a few cookies less are sent back on visiting the example.com page. But like the other social networks, Xing still receives a persistent cookie then: The s_vi cookie expires after five years and is valid for the whole xing.com domain and has no further path or subdomain restriction. The content of this cookie seems to be an identifier which is unique for each cookie and can therefore be used (in theory) to track logged out users visiting pages carrying the Visitors widget. This is possible since the s_vi cookie has also been sent to Xing when the user was logged in, before logging out and visiting example.com.

Finally I've tested, what happens when the user has no account at Xing and therefore no cookies to send back: Under this scenario no relevant cookies are sent to Xing of course. But as soon as the user visits xing.com such a five years valid s_vi persistent cookie is assigned to the user's browser. So all further visits to pages carrying Xing's Visitors widget can in theory be tracked using that cookie. When a long time later the user registers an account at Xing this information of the past (in case it was saved at Xing) could be linked to the user's profile.

When further inspecting the s_vi cookie it soon leads to a product that's called SiteCatalyst, which is capable of visitor tracking. From my personal perspective I believe that Xing is using that product to track visitors only along its own site and that Xing is not using the tracking potential which lies in linking that product's cookie to its own user profiles. But this would be much clearer if Xing used a distinct subdomain for serving the Visitors widget image and restricting the s_vi cookie to another subdomain, since only then the mentioned tracking potential of logged out users on other websites is no longer given.

Xing's Share button

Xing's Share button
Screenshot of
Xing's Share button

Independent from the Visitors widget Xing also offers a Share button to embed in webpages. This button's code is designed very fair, since it tracks nothing upon viewing a page with such a button embedded. The very first request to the image file sends the referrer and the cookies to Xing (as the Visitors widget does it on every request), but then all subsequent requests are served form the cache and no data (referrer and/or cookies) is sent to Xing. That's the case because the image response of this button has long-lasting cache headers with it.

I think this is a very fair situation compared to the other social networks. But since the Visitors widget is still capable of tracking users (that's what it was designed for: counting visitors from the Xing userbase) Xing's tracking capabilities can be compared to those of Google+ and Facebook, as long as the Visitors widget is used on other webpages and served from the www.xing.com domain. From my personal perspective the Visitors widget is deployed much less than the fair and user-friendly Share button of Xing.