Extensibility of social networks has had a significant impact in their large popularity. However, this comes with the price of exposing user information to 3rd-party extensions. Permission-based access control mechanisms can control access to user information, but they cannot control inference of private information from public information.
Introduction
Modern social computing platforms (e.g., Facebook) are extensible. Third-party developers deploy extensions (e.g., Facebook applications) that augment the functionalities of the underlying platforms. A platform API is provided by social networks so that third-party extensions could connect to the social graphs and access user information. Although this resulted in a drastic growth in the popularity of social networks, it also raised serious concerns about potential misuse of user information that has been made accessible thought the API. Without doubt, there are various privacy concerns associated to every information system. However, when an information system provides tools for third-parties to systematically access and harvest its content, then the privacy concerns are significantly heightened. This motivated me to focus my PhD thesis on addressing privacy threats in social computing platform that are derived from third-party extensions.
Social network providers put a lot of efforts into protecting privacy of their users. Permission-based authorization mechanisms are employed to allow users determine different levels of access to their information for other users in the social graph as well as third-party extensions. However, these protection mechanisms fail to prevent the inference of users’ private information from their public information. This type of privacy breach is generally called inference attacks. We coined the name SNS API inference attacks for the inferences that are made based on the information accessible through the platform API of Social Network Systems (SNS). I conducted an empirical study to demonstrate the inadequacy of the existing mechanisms in protecting user privacy. In this empirical study, I developed a third-party application for Facebook platform API and asked 424 Facebook users to subscribe to my application. The application then executed several sample inference algorithms against the participants’ user profile. The success rates of the sample algorithms were evaluated to alarmingly large figures. For instance, one of the algorithms could successfully infer the youngest sibling for 69% of the participants. The complete result of this experiment was reported in [1].
Significance of the Problem
A naïve interlocutor may argue that the above issue has already been addressed by the permission-based access control mechanism, in that third-party extensions cannot access user information without seeking the required permissions. If a user does not trust a third-party application, then she shall not authorize it or use it. This argument presumes that ordinary users have the necessary information and expertise to judge whether the applications they subscribe to are benign. In reality, most of the third-party applications are developed by developers who are not widely known to the user community. Not only that, the application is running on an untrusted server, meaning that there is no mechanism to monitor if the application is malicious. It is therefore not always possible for a user to assess if she can trust an application. It is our position that security-by-disclaimer is not a meaningful protection strategy. An interlocutor may also claim that SNS API inference attacks are but another minor privacy violation that does not warrant our attention. I disagree for two reasons. First, analyzing the threats of any security or privacy concern must be accompanied by assessing the number of potential victims. If one develops a website with around 100 registered users, revealing their registration information means violating the privacy of only 100 users. However, when the number of potential victims reaches 50 million, then we are facing a problem with costly consequences. Popular Facebook applications may command a monthly active user count of 50 million. This implies that an inference attack with a meagre success rate of 10% leads to privacy violations of 5 million victims. Second, SNS API inference attacks can be employed as a building block for conducting more dangerous security attacks. For instance, a well-known alternative authentication mechanism is to ask users a security question such as, “what is the name of your youngest sibling?”, “who is your favorite author?”, etc. Due to the nature of information that people upload to their SNS user profiles, answers to these security questions can usually be harvested systematically by launching inference attacks. The ability to answer a victim’s security questions is the first step of identity theft. Therefore, inference attacks could be an initial step in the launching of more dangerous attacks. Now, who is best positioned to launch covert inference attacks? The answer is third-party extension developers.
View-based Protection
The discussion above shows that controlling access is insufficient for preventing SNS API inference attacks. The reason is that there exists statistical correlation between sensitive information (which the user attempts to hide) and accessible information (which the user allows access). A malicious third-party application can exploit this correlation to infer sensitive information from the information that is legitimately accessible under the access control model. The key to protection is thus the breaking of correlation rather than simply denying access to sensitive information. In my PhD thesis, I advocate a view-based protection model. Under this model, when a third-party application A queries the profile P of user uthrough the API, the query Q is not evaluated against P itself. Instead, P first undergoes a sanitizing transformation T(P), before Q is evaluated against the sanitized profile. The transformation T is called a view, which is specified by the user and/or the platform. T is thus an enforcement-layer privacy policy. A view may eliminate certain attributes (access control), or probabilistically transform the profile with the aim of perturbing the statistical correlation between sensitive and accessible information. In other words, view-based protection subsumes access control. The mathematical formulation of privacy and utility goals, and the proof method for establishing that a given view satisfies the two goals, are the topics of a recent paper [2] that I published with the help of my supervisors, Philip Fong, and Reyhaneh Safavi-Naini.
Challenge of View Materialization
How shall one implement view-based protection in an efficient manner? A naive approach is to compute T(P) every time P is queried. The problem is that P can be large (imagine everything in one’s timeline, photo albums, etc.), thereby causing even the most innocent query Q to be penalized in performance. Another approach is to have the SNS store both P and T(P). The problem is that T(P) will have to be recomputed every time P is updated (which happens frequently). Not only that, T is specific to the user u and the application A, meaning that the SNS needs to store a T(P) for every application A that user u subscribes to – a space inefficient option. In another paper, which is still under peer review for publication, I propose a middle way. The computation of T(P) is called the materialization of view T. I argue that materialization should be performed in a lazy manner, at the time of query. To see this, the query Q may not access all components of profile P. Instead of eagerly applying T to the entire profile P, we apply T to the parts of P that are visible to query Q. A simple query that involves only a small fragment of profile P will therefore incur only a meager amount of materialization, thereby preventing the performance penalty of the above approach. As T is computed at the time of query, there is no need to maintain multiple materialized views, thereby preventing the view maintenance problem of approach 2. In [3], I propose a language-independent enforcement mechanism that materializes a view in a lazy manner. Moreover, I present a new type of state machines to model sanitizing transformation so that they could fit in the proposed enforcement framework. The state machines are composable, which means complex transformations can be built by composing simpler transformations. Via another experiment, the performance and effectiveness of the view-based protection model is also demonstrated. Please refer to my thesis for further details [3].
I am highly thankful to my supervisors, Dr. Philip Fong and Dr. Reyhaneh Safavi-Naini for their wonderful support and contributions.