Language:
You're in ResearchPublications

Publications_

These are our research publications.

Like a Pack of Wolves: Community Structure of Web Trackers

Web trackers are services that monitor user behavior on the web. The information they collect is ostensibly used for customization and targeted advertising. Due to rising privacy concerns, users have started to install browser plugins that prevent tracking of their web usage. Such plugins tend to address tracking activity by means of crowdsourced filters. While these tools have been relatively effective in protecting users from privacy violations, their crowdsourced nature requires significant human effort, and provide no fundamental understanding of how trackers operate. In this paper, we leverage the insight that funda- mental requirements for trackers’ success can be used as discriminating features for tracker detection. We begin by using traces from a mobile web proxy to model user browsing behavior as a graph. We then perform a transformation on the extracted graph that reveals very well-connected communities of trackers. Next, after discovering that trackers’ position in the transformed graph significantly differentiates them from “normal” vertices, we design an automated tracker detection mechanism using two simple algorithms. We find that both techniques for automated tracker detection are quite accurate (over 97%) and robust (less than 2% false positives). In conjunction with previous research, our findings can be used to build robust, fully automated online privacy preservation systems.

PDF

Is The Web HTTP/2 Yet?

Version 2 of the Hypertext Transfer Protocol (HTTP/2) was finalized in May 2015 as RFC 7540. It addresses well-known problems with HTTP/1.1 (e.g., head of line blocking and redundant headers) and introduces new features (e.g., server push and content priority). Though HTTP/2 is designed to be the future of the web, it remains unclear whether the web will—or should—hop on board. To shed light on this question, we built a measurement platform that monitors HTTP/2 adoption and performance across the Alexa top 1 million websites on a daily basis. Our system is live and up-to-date results can be viewed at [1]. In this paper, we report findings from an 11 month measurement campaign (November 2014 – October 2015). As of October 2015, we find 68,000 websites reporting HTTP/2 support, of which about 10,000 actually serve content with it. Unsurprisingly, popular sites are quicker to adopt HTTP/2 and 31% of the Alexa top 100 already support it. For the most part, websites do not change as they move from HTTP/1.1 to HTTP/2; current web development practices like inlining and do- main sharding are still present. Contrary to previous results, we find that these practices make HTTP/2 more resilient to losses and jitter. In all, we find that 80% of websites supporting HTTP/2 experience a decrease in page load time compared with HTTP/1.1 and the decrease grows in mobile networks. 

PDF

An Empirical Study of Android Alarm Usage for Application Scheduling

Android applications often rely on alarms to schedule back- ground tasks. Since Android KitKat, applications can opt-in for deferrable alarms, which allows the OS to perform alarm batching to reduce device awake time and increase the chances of network traffic being generated simultaneously by different applications. This mechanism can result in significant battery savings if appropriately adopted.

In this paper we perform a large scale study of the 22,695 most popular free applications in the Google Play Market to quantify whether expec- tations of more energy efficient background app execution are indeed warranted. We identify a significant chasm between the way application developers build their apps and Android’s attempt to address energy inefficiencies of background app execution. We find that close to half of the applications using alarms do not benefit from alarm batching capa- bilities. The reasons behind this is that (i) they tend to target Android SDKs lagging behind by more than 18 months, and (ii) they tend to feature third party libraries that are using non-deferrable alarms.

PDF

Identifying the Root Cause of Video Streaming Issues on Mobile Devices

Giorgos Dimopoulos, Ilias Leontiadis, Pere Barlet-Ros, Konstantina Papagiannaki, Peter Steenkiste

Proceedings of the 11th ACM International on Conference on emerging Networking Experiments and Technologies (CoNEXT), Heidelberg, Germany

Abstract:

Video streaming on mobile devices is prone to a multitude of faults and although well established video Quality of Experience (QoE) metrics such as stall frequency are a good indicator of the problems perceived by the user, they do not provide any insights about the nature of the problem nor where it has occurred. Quantifying the correlation between the aforementioned faults and the users' experience is a challenging task due the large number of variables and the numerous points-of-failure.
To address this problem, we developed a framework for diagnosing the root cause of mobile video QoE issues with the aid of machine learning. Our solution can take advantage of information collected at multiple vantage points between the video server and the mobile device to pinpoint the source of the problem. Moreover, our design works for di erent video types (e.g., bitrate, duration, ..) and contexts (e.g., wireless technology, encryption, ..) After training the system with a series of simulated faults in the lab, we analyzed the performance of each vantage point separately and when combined, in controlled and real world deployments. In both cases we nd that the involved entities can independently detect QoE issues and that only a few vantage points are required to identify a problem's location and nature.

 

Download here

Web Identity Translator: Behavioral Advertising and Identity Privacy with WIT

Fotios Papaodyssefs, Costas Iordanou, Jeremy Blackburn, Nikolaos Laoutaris, Dina Papagiannaki. 
HotNets-XIV Proceedings of the 14th ACM Workshop on Hot Topics in Networks, Nov 15

Abstract:

Online Behavioral Advertising (OBA) is an important revenue source for online publishers and content providers. However, the extensive user tracking required to enable OBA raises valid privacy concerns. Existing and proposed solutions either block all tracking, therefore breaking OBA entirely, or require significant changes on the current advertising infrastructure, making adoption hard. We propose Web Identity Translator (WIT), a new privacy service running as a proxy or middlebox. WIT stops the original tracking cookies from being set on the browser of users and instead substitutes them by private cookies it controls. Manipulating the mapping between tracking and private cookies WIT maintains permits transparent OBA to continue while simultaneously protecting the identity of users from attacks based on behavioral analysis of browsing patterns.

PDF

I’ll be there for you : Quantifying Attentiveness towards Mobile Messaging

Social norm has it that people are expected to respond to mobile phone messages quickly. We investigate how attentive people really are and how timely they actually check and triage new messages throughout the day. By collecting more than 55,000 messages from 42 mobile phone users over the course of two weeks, we were able to predict people’s attentiveness through their mobile phone usage with close to 80% accuracy. We found that people were attentive to messages 12.1 hours a day, i.e. 84.8 hours per week, and provide statistical evidence how very short people’s inattentiveness lasts: in 75% of the cases mobile phone users return to their attentive state within 5 minutes. In this paper, we present a comprehensive analysis of attentiveness throughout each hour of the day and show that intelligent notification delivery services, such as bounded deferral, can assume that inattentiveness will be rare and subside quickly.

Boredom-Triggered Proactive Recommendations

We propose the concept of boredom-triggered proactive recommendations for mobile phones. Given that more and more services attempt to attract mobile phone users' attention via push notifications, attention in this context can be considered as an increasingly scarce resource. However, when people are bored, by definition, they seek stimuli and hence often turn to their mobile phones. We cite evidence from our most recent work that boredom can be inferred from patterns of mobile phone usage and that during inferred phases of boredom, people are more likely to engage with suggested content. Thus, using boredom as content-independent trigger might help to make proactive recommendations a more pleasant experience and, in consequence, more successful.

Multi-Context TLS (mcTLS): Enabling Secure In-Network Functionality in TLS

A significant fraction of Internet traffic is now encrypted and HTTPS will likely be the default in HTTP/2. However, Transport Layer Security (TLS), the standard protocol for encryption in the Internet, assumes that all functionality resides at the endpoints, making it impossible to use in-network services that optimize network resource usage, improve user experience, and protect clients and servers from security threats. Re-introducing in-network functionality into TLS sessions today is done through hacks, often weakening overall security.

In this paper we introduce multi-context TLS (mcTLS), which extends TLS to support middleboxes. mcTLS breaks the current "all-or-nothing" security model by allowing endpoints and content providers to explicitly introduce middleboxes in secure end-to-end sessions while controlling which parts of the data they can read or write.

We evaluate a prototype mcTLS implementation in both controlled and "live" experiments, showing that its benefits come at the cost of minimal overhead. More importantly, we show that mcTLS can be incrementally deployed and requires only small changes to client, server, and middlebox software.

The Power of Indirect Ties

While direct social ties have been intensely studied in the context of computer-mediated social networks, indirect ties (e.g., friends of friends) have seen little attention. Yet in real life, we often rely on friends of our friends for recommendations (of good doctors, good schools, or good babysitters), for introduction to a new job opportunity, and for many other occasional needs. In this work we attempt to 1) quantify the strength of indirect social ties, 2) validate the quantification, and 3) empirically demonstrate its usefulness for applications on two examples. We quantify social strength of indirect ties using a measure of the strength of the direct ties that connect two people and the intuition provided by the sociology literature. We evaluate the proposed metric by framing it as a link prediction problem and experimentally demonstrate that our metric accurately (up to 87.2%) predicts link’s formation. We show via data-driven experiments that the proposed metric for social strength can be used successfully for social applications. Specifically, we show that it can be used for predicting the effects of information diffusion with an accuracy of up to 0.753. We also show that it alleviates known problems in friend-to-friend storage systems by addressing two previously documented shortcomings: reduced set of storage candidates and data availability correlations.

"Mobile Data for Public Health: Opportunities and Challenges"

The ubiquity of mobile phones worldwide is generating an unprecedented amount of human behavioral data both at an individual and aggregated levels. The study of this data as a rich source of information about human behavior emerged almost a decade ago. Since then, it has grown into a fertile area of research named computational social sciences with a wide variety of applications in different fields such as social networks, urban and transport planning, economic development, emergency relief, and, recently, public health. In this paper, we briefly describe the state of the art on using mobile phone data for public health, and present the opportunities and challenges that this kind of data presents for public health.

Impact of Carrier-Grade NAT on Web Browsing

Enrico Bocchi, Ali Safari Khatouni, Stefano Traverso, Alessandro Finamore, Valeria Di Gennaro, Marco Mellia, Maurizio Munafò, Dario Rossi

Impact of Carrier-Grade NAT on Web Browsing

6th International Workshop on TRaffic Analysis and Characterization (TRAC 2015), Dubrovnik, Croatia, 10 May 2015

 

Abstract:

 

Public IPv4 addresses are a scarce resource. While IPv6 adoption is lagging, Network Address Translation (NAT) technologies have been deployed over the last years to alleviate IPv4 exiguity and their high rental cost. In particular, Carrier- Grade NAT (CGN) is a well known solution to mask a whole ISP network behind a limited amount of public IP addresses, significantly reducing expenses.

Despite its economical benefits, CGN can introduce connectiv- ity issues which have sprouted a considerable effort in research, development and standardization. However, to the best of our knowledge, little effort has been dedicated to investigate the impact that CGN deployment may have on users’ traffic. This paper fills the gap. We leverage passive measurements from an ISP network deploying CGN and, by means of the Jensen- Shannon divergence, we contrast several performance metrics considering customers being offered public or private addresses. In particular, we gauge the impact of CGN presence on users’ web browsing experience.

Our results testify that CGN is a mature and stable technology as, if properly deployed, it does not harm users’ web browsing experience. Indeed, while our analysis lets emerge expected stochastic differences of certain indexes (e.g., the difference in the path hop count), the measurements related to the quality of users’ browsing are otherwise unperturbed. Interestingly, we also observe that CGN protects customers from unsolicited, often malicious, traffic. 

 

Download Here

 

Complexities in Internet Peering: Understanding the "Black" in the "Black Art

Peering in the Internet interdomain network has long been considered a “black art”, understood in-depth only by a select few peering experts while the majority of the network operator community only scratches the surface employing conventional rules-of-thumb to form peering links through ad hoc personal interactions. Why is peering considered a black art? What are the main sources of complexity in identifying potential peers, negotiating a stable peering relationship, and utility optimization through peering? How do contemporary operational practices approach these problems? In this work we address these questions for Tier-2 Network Service Providers. We identify and explore three major sources of complexity in peering: (a) inability to predict traffic flows prior to link formation (b) inability to predict economic utility owing to a complex transit and peering pricing structure (c) computational infeasibility of identifying the optimal set of peers because of the network structure. We show that framing optimal peer selection as a formal optimization problem and solving it is rendered infeasible by the nature of these problems. Our results for traffic complexity show that 15% NSPs lose some fraction of customer traffic after peering. Additionally, our results for economic complexity show that 15% NSPs lose utility after peering, approximately, 50% NSPs end up with higher cumulative costs with peering than transit only, and only 10% NSPs get paid-peering customers. 

Exploring Cyberbullying and Other Toxic Behavior in Team Competition Online Games

In this work we explore cyberbullying and other toxic behav- ior in team competition online games. Using a dataset of over 10 million player reports on 1.46 million toxic players along with corresponding crowdsourced decisions, we test several hypotheses drawn from theories explaining toxic behavior. Besides providing large-scale, empirical based understanding of toxic behavior, our work can be used as a basis for building systems to detect, prevent, and counter-act toxic behavior.

The Do Not Disturb Challenge – A Day Without Notifications

We report from the first holistic study of the effect of notifications across services, devices, and work and private life. We asked 12 people to disable notification alerts on all computing devices for 24 hours. Data was collected through open post-hoc interviews, which were analyzed by Open Coding. The participants showed very strong and polarized opinions towards the missing notification alerts. During work, some participants felt less stressed and more productive thanks to not being interrupted, however outside of the work context, some became stressed and anxious because they were afraid of missing important information and violating expectations of others. The only consistent findings across the participants was that none of them would keep notifications disabled altogether. Notifications may affect people negatively, but they are essential: cant live with them, can’t live without them.

The Collision of Online and Offline Expectations in Computer-Mediated Communication

Thanks to mobile phones, computer-mediated communication allows us to get in touch with people anywhere, anytime. We are no longer limited to being strictly online or offline. Hence, users can easily get caught between the expectations of people who are co-located in the offline/physical world, and the expectations of others who try to contact them online. However, existing systems today offer little help to effectively manage and balance these potentially colliding expectations from both worlds. We argue that one fruitful strategy to tackle this challenge is to share contextual cues not only with those trying to connect with us via the online world, as proposed in previous work, but also with people who are co-located with us in the offline world.

The Cost of the "S" in HTTPS

David Naylor, Alessandro Finamore, Ilias Leontiadis, Yan Grunenberger, Marco Mellia, Maurizio M. Munafò, Kostantina Papagiannaki, Peter Steenkiste.

The Cost of the "S" in HTTPS

Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies (CoNEXT), Sidney, Australia, ISBN: 978-1-4503-3279-8, December 2-5, 2014

 

Abstract

 

Increased user concern over security and privacy on the Internet has led to widespread adoption of HTTPS, the secure version of HTTP. HTTPS authenticates the communicating end points and provides confidentiality for the ensuing communication. However, as with any security solution, it does not come for free. HTTPS may introduce overhead in terms of infrastructure costs, communication latency, data usage, and energy consumption. Moreover, given the opaqueness of the encrypted communication, any innetwork value added services requiring visibility into application layer content, such as caches and virus scanners, become ineffective.

This paper attempts to shed some light on these costs. First, taking advantage of datasets collected from large ISPs, we examine the accelerating adoption of HTTPS over the last three years. Second, we quantify the direct and indirect costs of this evolution. Our results show that, indeed, security does not come for free. This work thus aims to stimulate discussion on technologies that can mitigate the costs of HTTPS while still protecting the user’s privacy. 

 

Download Here

Linguistic Analysis of Toxic Behavior in an Online Video Game

In this paper we explore the linguistic components of toxic behavior by using crowdsourced data from over 590 thousand cases of accused toxic players in a popular match-based competition game, League of Legends. We perform a series of linguistic analyses to gain a deeper understanding of the role communication plays in the expression of toxic behavior. We characterize linguistic behavior of toxic players and compare it with that of typical players in an online competition game. We also find empirical support describing how a player transitions from typical to toxic behavior. Our findings can be helpful to automatically detect and warn players who may become toxic and thus insulate potential victims from toxic playing in advance.

Coverage, Redundancy and Size-awareness in Genre Diversity for Recommender Systems

There is increasing awareness in the Recommender Systems
field that diversity is a key property that enhances the usefulness of recommendations. Genre information can serve as a means to measure and enhance the diversity of recommendations and is readily available in domains such as movies, music or books. In this work we propose a new Binomial framework for defining genre diversity in recommender systems that takes into account three key properties: genre
coverage genre redundancy and recommendation list
size-awareness.
 
We show that methods previously proposed for measuring
and enhancing recommendation diversity –including those
adapted from search result diversification– fail to address
adequately these three properties. We also propose an ef-
ficient greedy optimization technique to optimize Binomial
diversity. Experiments with the Netflix dataset show the
properties of our framework and comparison with state of
the art methods.

Question Recommendation for Collaborative Question Answering Systems with RankSLDA

Collaborative question answering (CQA) communities rely on user participation for their success. This paper presents a supervised Bayesian approach to model expertise in on-line CQA communities with application to question recommendation, aimed at reducing waiting times for responses and avoiding question starvation. We propose a novel algorithm called RankSLDA which extends the supervised Latent Dirichlet Allocation model by considering a learning-to-rank paradigm. This allows us to exploit the inherent collaborative effects that are present in CQA communities where users tend to answer questions in their topics of expertise. Users can thus be modeled on the basis of the topics in which they demonstrate expertise. In the supervised stage of the method we model the pairwise order of expertise of users on a given question. We compare RankSLDA
against several alternative methods on data from the Cross Validate community, part of the Stack Exchange network. RankSLDA outperforms all alternative methods by a signiffcant margin

CARS2: Learning Context-aware Representations for Context-aware Recommendations

Rich contextual information is typically available in many
recommendation domains allowing recommender systems to
model the subtle effects of context on preferences. Most contextual models assume that the context shares the same la-
tent space with the users and items. In this work we propose
CARS2, a novel approach for learning context-aware representations for context-aware recommendations. We show
that the context-aware representations can be learned using an appropriate model that aims to represent the type
of interactions between context variables, users and items.
We adapt the CARS2 algorithms to explicit feedback data by using a quadratic loss function for rating prediction, and
to implicit feedback data by using a pairwise and a listwise
ranking loss functions for top-N recommendations. By using stochastic gradient descent for parameter estimation we
ensure scalability. Experimental evaluation shows that our CARS2
models achieve competitive recommendation performance, compared to several state-of-the-art approaches.

Gaussian Process Factorization Machines for Context-aware Recommendations

Context-aware recommendation (CAR) can lead to significant improvements in the relevance of the recommended items by modeling the nuanced ways in which context influences preferences. The dominant approach in context-aware recommendation has been the multidimensional latent factors approach in which users, items, and context variables are represented as latent features in a low-dimensional space.

An interaction between a user, item, and a context variable is typically modeled as some linear combination of their latent features. However, given the many possible types of interactions between user, items and contextual variables, it may seem unrealistic to restrict the interactions among them to linearity.

To address this limitation, we develop a novel and powerful non-linear probabilistic algorithm for context-aware recommendation using Gaussian processes. The method which we call Gaussian Process Factorization Machines (GPFM) is applicable to both the explicit feedback setting (e.g. numerical ratings as in the Netflix dataset) and the implicit feedback setting (i.e. purchases, clicks). We derive stochastic gradient descent optimization to allow scalability of the

model. We test GPFM on five different benchmark contextual datasets. Experimental results demonstrate that

GPFM outperforms state-of-the-art context-aware recommendation methods

Sentiment retrieval on web reviews using spontaneous natural speech

 J.C. Pereira, J. Luque, X. Anguera,¨Sentiment retrieval on web reviews using spontaneous natural speech¨ in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing,(ICASSP’14), May 2014

This paper addresses the problem of document retrieval based on sentiment polarity criteria. A query based on natural spontaneous speech, expressing an opinion about a certain topic, is used to search a repository of documents containing favorable or unfavorable opinions. The goal is to retrieve documents whose opinions more closely resemble the one in the query. A semantic system based on the speech transcripts is augmented with information from full-length text articles. Posterior probabilities extracted from the article are used to regularize their transcription counterparts. This paper makes three important contributions. First, we introduce a framework for polarity analysis of sentiments that can accommodate combinations of different modalities, while maintaining the flexibility of unimodal systems, i.e. capable of dealing with the absence of any modality. Second, we show that it is possible to improve average precision on speech transcriptions’ sentiment retrieval by means of regularization. Third, we demonstrate the strength and generalization of our approach by training regularizers on one dataset, while performing sentiment retrieval experiments, with substantial gains, on a collection of YouTube clips. 

 

Download here

Inferring social relationships in a phone call from a single party’s speech

S.H. Yella, X. Anguera, J. Luque, “Inferring social relationships in a phone call from a single party’s speech“, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’14), May 2014

People usually speak differently depending on who they talk to. Based on this hypothesis, in this paper we propose an automatic method to detect the social relationship between two people based solely on a set of acoustic and conversational characteristics. We argue that changes in these features of an individual reflect her social relationship with the other person. To infer relationship we only require the speech of one of the conversation partners and the interaction patterns between both speakers. We validate the proposed system using a real-life telephone database with calls made by several speakers to close family members and to their partners. We trained a classifier using a boosting algorithm on a set of conversational and acoustic features and use it to classify calls according to the social relationship between both speakers. Tests performed on models trained on single speaker’s data show that for most people such prediction is feasible. We also show that these characteristics generalize quite well across speakers, achieving around 75% accuracy when both sets of features are combined. 

Download here

STFU NOOB!: Predicting Crowdsourced Decisions on Toxic Behavior in Online Games

One problem facing players of competitive games is negative, or toxic, behavior. League of Legends, the largest eSport game, uses a crowdsourcing platform called the Tribunal to judge whether a reported toxic player should be punished or not. The Tribunal is a two stage system requiring reports from those players that directly observe toxic behavior, and human experts that review aggregated reports. While this system has successfully dealt with the vague nature of toxic behavior by majority rules based on many votes, it naturally requires tremendous cost, time, and human efforts.

In this paper, we propose a supervised learning approach for predicting crowdsourced decisions on toxic behavior with largescale labeled data collections; over 10 million user reports involved in 1.46 million toxic players and corresponding crowdsourced decisions. Our result shows good performance in detecting overwhelmingly majority cases and predicting crowdsourced decisions on them. We demonstrate good portability of our classifier across regions. Finally, we estimate the practical implications of our approach, potential cost savings and victim protection.

Social-Aware Replication in Geo-Diverse Online Systems

Stefano Traverso, Kévin Huguenin, Ionut Trestian, Vijay Erramilli, Nikolaos Laoutaris, Konstantina Papagiannaki

Social-Aware Replication in Geo-Diverse Online Systems

IEEE Transactions on Parallel and Distributed Systems, March 2014

 

Abstract

Distributing long-tail content is a difficult task due to the low amortization of bandwidth transfer costs as such content has limited number of views. Two recent trends are making this problem harder. First, the increasing popularity of user-generated content and online social networks create and reinforce such popularity distributions. Second, the recent trend of geo-replicating content across multiple points of presence spread around the world, done for improving quality of experience (QoE) for users. In this paper, we analyze and explore the tradeoff involving the “freshness” of the information available to the users and WAN bandwidth costs, and we propose ways to reduce the latter through smart update propagation scheduling, by leveraging on the knowledge of the mapping between social relationships and geographic location, the timing regularities and time differences in end user activity. We first assess the potential of our approach by implementing a simple social-aware scheduling algorithm that operates under bandwidth budget constraints and by quantifying its benefits through a trace-driven analysis. We show that it can reduce WAN traffic by up to 55% compared to an immediate update of all replicas, with a minimal effect on information freshness and latency. Second, we build TailGate, a practical system that implements our social-aware scheduling approach, which distributes on the fly long-tail content across PoPs at reduced bandwidth costs by flattening the traffic. We evaluate TailGate by using traces from an OSN and show that it can decrease WAN bandwidth costs by as much as 80% and improve QoE. We deploy TailGate on PlanetLab and show that even in the case when imprecise social information is available, it can still decrease by a factor of 2 the latency for accessing long-tail YouTube videos. 

 

Dowload here

Language Indipendent Search in MediaEval's Spoken Web Search Task

Florian Metze, Xavier Anguera, Etienne Barnard, Marelie Daviel, Guillaume Gravier

"Language Independent Search in MediaEval's Spoken Web Search Task"

Computer Speech & Language, January 2014

Abstract

In this paper, we describe several approaches to language-independent spoken term detection and compare their performance on a common task, namely “Spoken Web Search”. The goal of this part of the MediaEval initiative is to perform low-resource language-independent audio search using audio as input. The data was taken from “spoken web” material collected over mobile phone connections by IBM India as well as from the LWAZI corpus of African languages. As part of the 2011 and 2012 MediaEval benchmark campaigns, several diverse systems have been implemented by independent teams, and submitted to the “Spoken Web Search” evaluation. This paper presents the 2011 and 2012 results, and compares the relative merits and weaknesses of approaches developed by participants, providing analysis and directions for future research, in order to improve voice access to spoken information in low resource settings. 

Download here

RILAnalyzer: a Comprehensive 3G Monitor On Your Phone

Narseo Vallina-Rodriguez, Andrius Aucinas, Mario Almeida, Yan Grunenberger, Konstantina Papagiannaki, Jon Crowcroft

RILAnalyzer: a Comprehensive 3G Monitor On Your Phone

ACM Internation Measurement Conference, October 2013

 

Abstract

The popularity of smartphones, cloud computing, and the app store model have led to cellular networks being used in a completely different way than what they were designed for. As a consequence, mobile applications impose new challenges in the design and efficient configuration of constrained networks to maximize application’s performance. Such dif- ficulties are largely caused by the lack of cross-layer understanding of interactions between different entities - applications, devices, the network and its management plane. In this paper, we describe RILAnalyzer, an open-source tool that provides mechanisms to perform network analysis from within a mobile device. RILAnalyzer is capable of recording low-level radio information and accurate cellular network control-plane data, as well as user-plane data. We demonstrate how such data can be used to identify previously overlooked issues. Through a small user study across four cellular network providers in two European countries we infer how different network configurations are in reality and explore how such configurations interact with application logic, causing network and energy overheads. 

Download here

GAPfm: Optimal top-n recommendations for graded relevance domains

Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Alan Hanjalic

GAPfm: Optimal top-n recommendations for graded relevance domains

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, October 2013

 

Abstract

 

Recommender systems are frequently used in domains in which users express their preferences in the form of graded judgments, such as ratings. If accurate top-N recommenda- tion lists are to be produced for such graded relevance do- mains, it is critical to generate a ranked list of recommended items directly rather than predicting ratings. Current tech- niques choose one of two sub-optimal approaches: either they optimize for a binary metric such as Average Precision, which discards information on relevance grades, or they opti- mize for Normalized Discounted Cumulative Gain (NDCG), which ignores the dependence of an item’s contribution on the relevance of more highly ranked items.

In this paper, we address the shortcomings of existing approaches by proposing the Graded Average Precision factor model (GAPfm), a latent factor model that is particularly suited to the problem of top-N recommendation in domains with graded relevance data. The model optimizes for Graded Average Precision, a metric that has been proposed recently for assessing the quality of ranked results list for graded relevance. GAPfm learns a latent factor model by directly optimizing a smoothed approximation of GAP. GAPfm’s advantages are twofold: it maintains full information about graded relevance and also addresses the limita- tions of models that optimize NDCG. Experimental results show that GAPfm achieves substantial improvements on the top-N recommendation task, compared to several state-of- the-art approaches. In order to ensure that GAPfm is able to scale to very large data sets, we propose a fast learning algorithm that uses an adaptive item selection strategy. A final experiment shows that GAPfm is useful not only for generating recommendation lists, but also for ranking a given list of rated items. 

 

Dowload here

Follow the money: understanding economics of online aggregation and advertising

Phillipa Gill, Vijay Erramilli, Augustin Chaintreau, Balachander Krishnamurthy, Konstantina Papagiannaki, Pablo Rodriguez

Follow the money: understanding economics of online aggregation and advertising

ACM Proceedings of the 2013 conference on Internet measurement conference, October 2013

 

Abstract

The large-scale collection and exploitation of personal infor- mation to drive targeted online advertisements has raised privacy concerns. As a step towards understanding these concerns, we study the relationship between how much information is collected and how valuable it is for advertising. We use HTTP traces consisting of millions of users to aid our study and also present the first comparative study between aggregators. We develop a simple model that captures the various parameters of today’s advertising revenues, whose values are estimated via the traces. Our results show that per aggregator revenue is skewed (5% accounting for 90% of revenues), while the contribution of users to advertising revenue is much less skewed (20% accounting for 80% of revenue). Google is dominant in terms of revenue and reach (presence on 80% of publishers). We also show that if all 5% of the top users in terms of revenue were to install privacy protection, with no corresponding reaction from the publishers, then the revenue can drop by 30%. 

Dowload here

 

The spoken web search task

Xavier Anguera, Florian Metze, Andi Buzo, Igor Szoke, Luis Javier Rodriguez-Fuentes

The spoken web search task

MediaEval 2013 Workshop, October 2013, Barcelona, Spain

 

Abstract

In this paper, we describe the “Spoken Web Search” Task, which is being held as part of the 2013 MediaEval campaign. The purpose of this task is to perform audio search in multiple languages and acoustic conditions, with very few resources being available for each individual language. This year the data contains audio from nine different languages and is much bigger in size than in previous years, mimicking realistic low/zero-resource settings. 

Dowloan here

xCLiMF: optimizing expected reciprocal rank for data with multiple levels of relevance

Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Alan Hanjalic

xCLiMF: optimizing expected reciprocal rank for data with multiple levels of relevance

Proceedings of the 7th ACM conference on Recommender systems, October 2013

 

Abstract

Extended Collaborative Less-is-More Filtering xCLiMF is a learning to rank model for collaborative filtering that is specifically designed for use with data where information on the level of relevance of the recommendations exists, e.g. through ratings. xCLiMF can be seen as a generalization of the Collaborative Less-is-More Filtering (CLiMF) method that was proposed for top-N recommendations using binary relevance (implicit feedback) data. The key contribution of the xCLiMF algorithm is that it builds a recommendation model by optimizing Expected Reciprocal Rank, an evaluation metric that generalizes reciprocal rank in order to incorporate user feedback with multiple levels of relevance. Experimental results on real-world datasets show the effectiveness of xCLiMF, and also demonstrate its advantage over CLiMF when more then two levels of relevance exist in the data. 

Dowload here

Last call for the buffet: economics of cellular networks

Jeremy Blackburn, Rade Stanojevic, Vijay Erramilli, Adriana Iamnitchi, Konstantina Papagiannaki

Last call for the buffet: economics of cellular networks

ACM, Proceedings of the 19th annual international conference on Mobile computing & networking, September 2013

 

Abstract

Voice and data traffic growth over the last several years has become a major challenge for cellular operators with a direct impact on revenues, infrastructure investments, and end-user performance. The economics of these operators depend on various incentives used to attract users in the form of unlimited, buffet-like voice/sms/data packages. However, our understanding of the effects of user behavior under these offerings on operator revenues/costs remains poor. Using two years of detailed usage information of 1 million users across three services, voice, sms and data, combined with payment and cost information, we study how user behavior affects the economics of cellular operators. We discover that around 20% of the users consume more resources than what they pay for and hence are non-profitable. In addition to the individual user behavior, we study how the user interactions in the call graph affect the operator’s revenues and cost, drawing on tools from social network analysis. We develop a framework that incorporates both the individual and social user behavior for studying how volume caps influence the revenues and the traffic costs. Using this framework we empirically show that volume caps can increase the difference between the revenues and the traffic costs of the studied operator by a factor of 2, while affecting only 16% of the existing user base. 

Dowload here

Is there a case for mobile phone content pre-staging?

Alessandro Finamore, Marco Mellia, Zafar Gilani, Konstantina Papagiannaki, Vijay Erramilli, Yan Grunenberger

Is there a case for mobile phone content pre-staging?

Proceedings of the ninth ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT), September 2013

 

Abstract:

Content caching is a fundamental building block of the Inter- net. Caches are widely deployed at network edges to improve performance for end-users, and to reduce load on web servers and the backbone network. Considering mobile 3G/4G net- works, however, the bottleneck is at the access link, where bandwidth is shared among all mobile terminals. As such, per-user capacity cannot grow to cope with the traffic demand. Unfortunately, caching policies would not reduce the load on the wireless link which would have to carry multiple copies of the same object that is being downloaded by multiple mobile terminals sharing the same access link.

In this paper we investigate if it is worth to push the caching paradigm even farther. We hypothesize a system in which mobile terminals implement a local cache, where popular content can be pushed/pre-staged. This exploits the peculiar broadcast capability of the wireless channels to replicate content “for free” on all terminals, saving the cost of transmitting multiple copies of those popular objects. Relying on a large data set collected from a European mobile carrier, we analyse the content popularity characteristics of mobile traffic, and quantify the benefit that the push-to- mobile system would produce. We found that content pre-staging, by proactively and periodically broadcasting “bundles” of popular objects to devices, allows to both greatly i) improve users’ performance and ii) reduce up to 20% (40%) the downloaded volume (number of requests) in optimistic scenarios with a bundle of 100 MB. However, some technical constraints and content characteristics could question the actual gain such system would reach in practice. 

Dowload here

Network monitoring architecture based on home gateways

Claudio Casetti, Yan Grunenberger, Frank Den Hartog, Anukool Lakhina, Henrik Lundgren, Marco Milanesio, Anna-Kaisa Pietilainen, Renata Teixeira, Shuang Zhang

Network monitoring architecture based on home gateways

Future Network and MobileSummit 2013 Conference Proceedings, September 2013

 

Abstract

The “Future Internet Gateway-based Architecture of Residential netwOrks (FIGARO)” project proposes to tackle the new challenges arising from the shift of the Internet use from technology centric to user/content centric with a novel network architecture centered on the residential gateways. Many use cases for the FIGARO architecture such as home automation, distributed content management, content delivery optimizations, network performance monitoring and troubleshooting require advanced network monitoring functionality on the residential gateway. In this paper, we discuss the requirements and design of the FIGARO gateway-centric network monitoring architecture. 

Dowload here

Peripheral vibro-tactile displays

Martin Pielot, Rodrigo de Oliveira

Peripheral vibro-tactile displays

ACM, Proceedings of the 15th international conference on Human-computer interaction with mobile devices and services, August 2013

 

Abstract:

We report from a study exploring the boundaries of the pe- ripheral perception of vibro-tactile stimuli. For three days, we exposed 15 subjects to a continual vibration pattern that was created by a mobile device worn in their trouser pocket. In order to guarantee that the stimuli would not require the sub- jects focal attention, the vibration pattern was tested and refined to minimise its obtrusiveness, and during the study, the participants adjusted its intensity to just above their personal detection threshold. At random times, the vibration stopped and participants had to acknowledge these events as soon as they noticed them. Only 6.5% of the events were acknowledged fast enough to assume that the cue had been on the focus of the participants’ attention. The majority of events were answered between 1 and 10 minutes, which indicates that the participants were aware of the cue without focussing on it. In addition, participants reported not to be annoyed by the signal in 94.4% of the events. These results provide evidence that vibration patterns can form non-annoying, lightweight information displays, which can be consumed at the periphery of a users attention. 

Dowload here

Information Retrieval-based Dynamic Time Warping

Xavier Anguera

Information Retrieval-based Dynamic Time Warping

Proc. Interspeech, Lyon, France, August 2013

 

Abstract

In this paper we introduce a novel dynamic programming algorithm called Information Retrieval-based Dynamic Time Warp- ing (IR-DTW) used to find non-linearly matching subsequences between two time series where matching start and end points are not known a priori. In this paper our algorithm is applied for audio matching within the query by example (QbE) spoken term detection (STD) task, although it is applicable to many other problems. The main advantages of the proposed algorithm in comparison to similar approaches are twofold. On the one hand, IR-DTW requires a much smaller memory footprint than standard Dynamic Time Warping (DTW) approaches. On the other hand, it allows for the application of indexing tech- niques to the search collection for increased matching speed, which makes IR-DTW suitable for application in large scale implementations. We show through preliminary experimentation with a QbE-STD task that the memory footprint is greatly reduced in comparison to a baseline subsequence-DTW (S-DTW) implementation and that its matching accuracy is much better than that of pure diagonal matching and just slightly worse than that of S-DTW. 

Dowload here

CLiMF: collaborative less-is-more filtering

Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Nuria Oliver, Alan Hanjalic

CLiMF: collaborative less-is-more filtering

AAAI Press, Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, August 2013

 

Abstract

In this paper we tackle the problem of recommendation in the scenarios with binary relevance data, when only a few (k) items are recommended to individual users. Past work on Collaborative Filtering (CF) has either not addressed the ranking problem for binary relevance datasets, or not specifically focused on improving top-k recommendations. To solve the problem we propose a new CF approach, Collaborative Less-is-More Filter- ing (CLiMF). In CLiMF the model parameters are learned by directly maximizing the Mean Reciprocal Rank (MRR), which is a well-known in- formation retrieval metric for capturing the performance of top-k recommendations. We achieve linear computational complexity by introducing a lower bound of the smoothed reciprocal rank metric. Experiments on two social network datasets show that CLiMF significantly outperforms a naive baseline and two state-of-the-art CF methods. 

Download here

Using Tuangou to Reduce IP Transit Costs

Ignacio Castro, Rade Stanojevic, Sergey Gorinsky

"Using Tuangou to Reduce IP Transit Costs"

IEEE/ACM TRANSACTIONS ON NETWORKING, July 2013

 

Abstract

A majority of Internet service providers (ISPs) support connectivity to the entire Internet by transiting their traffic via other providers. Although the transit prices per megabit per second (Mbps) decline steadily, the overall transit costs of these ISPs remain high or even increase due to the traffic growth. The discontent of the ISPs with the high transit costs has yielded notable innovations such as peering, content distribution networks, multicast, and peer-to-peer localization. While the above solutions tackle the problem by reducing the transit traffic, this paper explores a novel approach that reduces the transit costs without altering the traffic. In the proposed Cooperative IP Transit (CIPT), multiple ISPs cooperate to jointly purchase Internet Protocol (IP) transit in bulk. The aggregate transit costs decrease due to the economies-of-scale effect of typical subadditive pricing as well as burstable billing: Not all ISPs transit their peak traffic during the same period. To distribute the aggregate savings among the CIPT partners, we propose Shapley-value sharing of the CIPT transit costs. Using public data about IP traffic and transit prices, we quantitatively evaluate CIPT and show that significant savings can be achieved, both in relative and absolute terms. We also discuss the organizational embodiment, relationship with transit providers, traffic confidentiality, and other aspects of CIPT. 

Download here

Games of Friends: a game-theoretical approach for link prediction in online social networks

Giovanni Zappella, Alexandros Karatzoglou, Linas Baltrunas

Games of Friends: a game-theoretical approach for link prediction in online social networks

Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence, June 2013

 

Abstract

Online Social Networks (OSN) have enriched the social lives of millions of users. Discovering new friends in the social network is valuable both for the user and for the health of OSN since users with more friends engage longer and more often with the site. The simplest way to formalize friend- ship recommendation is to cast the problem as a link prediction problem in the social graph. In this work we introduce a game-theoretical approach based on the Graph Transduction Game. It scales with ease beyond 13 million of users and was tested on a real world data from Tuenti OSN. We utilize the social graph and several other graphs that naturally arise in Tuenti such as the wall-to-wall post graph. We compare our approach to standard local measures and demonstrate a significant performance benefit in terms of mean average precision and reciprocal rank. 

Download here

ITMgen—A first-principles approach to generating synthetic interdomain traffic matrices

Jakub Mikians, Nikolaos Laoutaris, Amogh Dhamdhere, Pere Barlet-Ros

ITMgen—A first-principles approach to generating synthetic interdomain traffic matrices

Communications (ICC), 2013 IEEE International Conference on, June 2013

Download here

Exploiting foursquare and cellular data to infer user activity in urban environments

Anastasios Noulas, Cecilia Mascolo, Enrique Frias-Martinez

Exploiting foursquare and cellular data to infer user activity in urban environments

Mobile Data Management (MDM), 2013 IEEE 14th International Conference on, June 2013

 

Abstract

Inferring the type of activities in neighborhoods of urban centers may be helpful in a number of contexts including urban planning, content delivery and activity recommendations for mobile web users or may even yield to a deeper understanding of the geographical evolution of social life in the city . During the past few years, the analysis of mobile phone usage patterns, or of social media with longitudinal attributes, have aided the automatic characterization of the dynamics of the urban environment.

In this work, we combine a dataset sourced from a telecom- munication provider in Spain with a database of millions of geo- tagged venues from Foursquare and we formulate the problem of urban activity inference in a supervised learning framework. In particular, we exploit user communication patterns observed at the base station level in order to predict the activity of Foursquare users who checkin-in at nearby venues. First, we mine a set of machine learning features that allow us to encode the input telecommunication signal of a tower. Subsequently, we evaluate a diverse set of supervised learning algorithms using labels extracted from Foursquare place categories and we consider two application scenarios. Initially, we assess how hard it is to predict specific urban activity of an area, showing that Nightlife and Entertainment spots are those easier to infer, whereas College and Shopping areas are those featuring the lowest accuracy rates. Then, considering a candidate set of activity types in a geographic area, we aim to elect the most prominent one. We demonstrate how the difficulty of the problem increases with the number of classes incorporated in the prediction task, yet the classifiers achieve a considerably better performance compared to a random guess even when the set of candidate classes increases. 

 

Dowload here

ACORN: An auto-configuration framework for 802.11 n WLANs

Mustafa Y Arslan, Konstantinos Pelechrinis, Ioannis Broustis, Shailendra Singh, Srikanth V Krishnamurthy, Sateesh Addepalli, Konstantina Papagiannaki

ACORN: An auto-configuration framework for 802.11 n WLANs

Networking, IEEE/ACM Transactions on, June 2013
 
Abstract

The wide channels feature combines two adjacent channels to form a new, wider channel to facilitate high data rate transmissions in MIMO-based 802.11n networks. Using a wider channel can exacerbate interference effects. Furthermore, contrary to what has been reported by prior studies, we find that wide channels do not always provide benefits in isolation (i.e., one link without interference) and can even degrade performance. We conduct an in-depth, experimental study to understand the implications of wide channels on throughput performance. Based on our measurements, we design an auto-configuration framework called ACORN for enterprise 802.11n WLANs. ACORN integrates the functions of user association and channel allocation, since our study reveals that they are tightly coupled when wide channels are used. We show that the channel allocation problem with the constraints of wide channels is NP-complete. Thus, ACORN uses an algorithm that provides a worst case approximation ratio of O(1/∆+1) with ∆ being the maximum node degree in the network. We implement ACORN on our 802.11n testbed. Our evaluations show that ACORN (i) outperforms previous approaches that are agnostic to wide channels constraints; it provides per-AP throughput gains ranging from 1.5x to 6x and (ii) in practice, its channel allocation module achieves an approximation ratio much better than the theoretically predicted O(1/∆+1).

Dowload here

Perceptually inspired features for speaker likability classification

Sira Gonzalez, Xavier Anguera

Perceptually inspired features for speaker likability classification

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, May 2013

 

Abstract

We present a novel approach to speaker likability classifi- cation. Our algorithm, instead of extracting a large number of features, identifies a small set of features which rep- resent perceptual speech characteristics. For classification, linear support vector machines are used. We train and evaluate the performance on the Interspeech speaker trait challenge database and we show that our likability classifier outperforms the baseline classifier developed for the challenge while considerably reducing the number of features needed. 

Dowload here

Speed improvements to information retrieval-based dynamic time warping using hierarchical k-means clustering

Gautam Mantena, Xavier Anguera

Speed improvements to information retrieval-based dynamic time warping using hierarchical k-means clustering

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, June 2013

 

Abstract

With the increase in multi-media data over the Internet, query by example spoken term detection (QbE-STD) has become important in providing a search mechanism to find spoken queries in spoken au- dio. Audio search algorithms should be efficient in terms of speed and memory to handle large audio files. In general, approaches de- rived from the well known dynamic time warping (DTW) algorithm suffer from scalability problems.

To overcome such problems, an Information Retrieval-based DTW (IR-DTW) algorithm has been proposed recently. IR-DTW borrows techniques from Information Retrieval community to detect regions which are more likely to contain the spoken query and then uses a standard DTW to obtain exact start and end times. One drawback of the IR-DTW is the time taken for the retrieval of similar reference points for a given query point. In this paper we propose a method to improve the search performance of IR-DTW algorithm using a clustering based technique. The proposed method has shown an estimated speedup of 2400X. 

Dowload here

The spoken web search task at MediaEval 2012

Florian Metze, Xavier Anguera, Etienne Barnard, Marelie Davel, Guillaume Gravier

The spoken web search task at MediaEval 2012

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, May 2013

 

Abstract

In this paper, we describe the “Spoken Web Search” Task, which was held as part of the 2011 MediaEval benchmark campaign. The purpose of this task was to perform audio search with audio input in four languages, with very few resources being available in each language. The data was taken from “spoken web” material collected over mobile phone connections by IBM India. We present results from several independent systems, developed by five teams and us- ing different approaches, compare them, and provide analysis and directions for future research. 

Dowload here

Size matters (spacing not): 18 points for a dyslexic-friendly Wikipedia

Luz Rello, Martin Pielot, Mari-Carmen Marcos, Roberto Carlini

Size matters (spacing not): 18 points for a dyslexic-friendly Wikipedia

ACM, Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility, May 2013

 

Abstract

In 2012, Wikipedia was the sixth-most visited website on the Internet. Being one of the main repositories of knowledge, students from all over the world consult it. But, around 10% of these students have dyslexia, which impairs their access to text-based websites. How could Wikipedia be presented to be more readable for this target group? In an experi­ ment with 28 participants with dyslexia, we compare read­ ing speed, comprehension, and subjective readability for the font sizes 10, 12, 14, 18, 22, and 26 points, and line spac­ings 0.8, 1.0, 1.4, and 1.8. The results show that font size has a significant effect on the readability and the under­ standability of the text, while line spacing does not. On the basis of our results, we recommend using 18-point font size when designing web text for readers with dyslexia. Our results significantly differ from previous recommendations, presumably, because this is the first work to cover a wide range of values and to study them in the context of an actual website. 

Dowload here

Delay-Tolerant Bulk Data Transfers on the Internet

Nikolaos Laoutaris, Georgios Smaragdakis, Rade Stanojevic, Pablo Rodriguez, Ravi Sundaram

Delay-Tolerant Bulk Data Transfers on the Internet

IEEE Networking, IEEE/ACM Transactions on  (Volume:21 ,  Issue: 6 ) March 2013

Download here

Experimental evaluation of context-dependent collaborative filtering using item splitting

Linas Baltrunas, Francesco Ricci

Experimental evaluation of context-dependent collaborative filtering using item splitting

User Modeling and User-Adapted Interaction, February 2013

 

Abstract

Collaborative Filtering (CF) computes recommendations by leveraging a historical data set of users’ ratings for items. CF assumes that the users’ recorded ratings can help in predicting their future ratings. This has been validated extensively, but in some domains the user’s ratings can be influenced by contextual conditions, such as the time, or the goal of the item consumption. This type of contextual infor- mation is not exploited by standard CF models. This paper introduces and analyzes a novel technique for context-aware CF called Item Splitting. In this approach items experienced in two alternative contextual conditions are “split” into two items. This means that the ratings of a split item, e.g., a place to visit, are assigned (split) to two new fictitious items representing for instance the place in summer and the same place in winter. This split is performed only if there is statistical evidence that under these two contextual conditions the items ratings are different; for instance, a place may be rated higher in summer than in winter. These two new fictitious items are then used, together with the unaffected items, in the rating prediction algorithm. When the system must predict the rating for that “split” item in a particular contextual condition (e.g., in summer), it will consider the new fictitious item representing the original one in that particular contextual condition, and will predict its rating. We evaluated this approach on real world, and semi-synthetic data sets using matrix factorization, and nearest neighbor CF algorithms. We show that Item Splitting can be beneficial and its performance depends on the method used to determine which items to split. We also show that the benefit of the method is determined by the relevance of the contextual factors that are used to split. 

Download here

Adaptive non-parametric identification of dense areas using cell phone records for urban analysis

Alberto Rubio, Angel Sanchez, Enrique Frias-Martinez

Adaptive non-parametric identification of dense areas using cell phone records for urban analysis

Pergamon, Engineering Applications of Artificial Intelligence, January 2013

 

Abstract

Pervasive large-scale infrastructures (like GPS, WLAN networks or cell-phone networks) generate large datasets containing human behavior information. One of the applications that can benefit from this data is the study of urban environments. In this context, one of the main problems is the detection of dense areas, i.e., areas with a high density of individuals within a specific geographical region and time period. Nevertheless, the techniques used so far face an important limitation: the definition of dense area is not adaptive and as a result the areas identified are related to a threshold applied over the density of individuals, which usually implies that dense areas are mainly identified in downtowns. In this paper, we propose a novel technique, called AdaptiveDAD, to detect dense areas that adaptively define the concept of density using the infrastructure provided by a cell phone network. We evaluate and validate our approach with a real dataset containing the Call Detail Records (CDR) of fifteen million individuals. 

Download here

3GOL: Power-boosting ADSL using 3G OnLoading

Claudio Rossi, Narseo Vallina-Rodriguez, Vijay Erramilli, Yan Grunenberger, Lazlo Gyarmati, Nikolaos Laoutaris, Rade Stanojevic, Dina Papagiannaki, Pablo Rodriguez
3GOL: Power-boosting ADSL using 3G OnLoading
In ACM CoNEXT 2013, Santa Barbara, CA, December 2013 
 
Abstract

The co-existence of cellular and wired networks has been exploited almost exclusively in the direction of OffLoading traffic from the former onto the latter. In this paper we claim that there exist cases that call for the exact opposite, i.e., use the cellular network to assist a fixed wired network. In particular, we show that by “OnLoading” traffic from the wired broadband network onto the cellular network we can usefully speedup wired connections, on the downlink or the uplink. We consider the technological challenges pertaining to this idea and implement a prototype 3G OnLoading service that we call 3GOL, that can be deployed by an operator providing both the wired and cellular network services. By strategically OnLoading a fraction of the data transfers to the 3G network, one can significantly enhance the performance of particular applications. In particular we demonstrate non-trivial performance benefits of 3GOL to two widely used applications: video-on-demand and multimedia upload. We also consider the case when the operator that provides wired and cellular services is different, adding the analysis on economic constraints and volume cap on cellular data plans that need to be respected. Simulating 3GOL over a DSLAM trace we show that 3GOL can reduce video pre-buffering time by at least 20% for 50% of the users while respecting data caps and we design a simple estimator to compute the daily allowance that can be used towards 3GOL while respecting caps. Our prototype is currently being piloted in 30 households in a large European city by a large network provider. 

 

Forecasting socioeconomic trends with cell phone records

Vanessa Frias-Martinez, Cristina Soguero-Ruiz, Enrique Frias-Martinez, Malvina Josephidou

Forecasting socioeconomic trends with cell phone records

Proceedings of the 3rd ACM Symposium on Computing for Development, January 2013

 

Abstract

National Statistical Institutes typically hire large numbers of enumerators to carry out periodic surveys regarding the socioeconomic status of a society. Such approach suffers from two drawbacks:(i) the survey process is expensive, especially for emerging countries that struggle with their budgets and (ii) the socioeconomic indicators are computed ex-post i.e., after socioeconomic changes have already happened. We propose the use of human behavioral patterns computed from calling records to predict future values of socioeconomic indicators. Our objective is to help institutions be able to forecast socioeconomic changes before they happen while reducing the number of surveys they need to compute. For that purpose, we explore a battery of different predictive approaches for time series and show that multivariate time-series models yield R-square values of up to 0.65 for certain socioeconomic indicators. 

Download here

On weather and internet traffic demand

Juan Camilo Cardona, Rade Stanojevic, Rubén Cuevas

On weather and internet traffic demand

Springer Berlin Heidelberg, Passive and Active Measurement, January 2013

 

Abstract

The weather is known to have a major impact on demand of utilities such as electricity or gas. Given that the Internet usage is strongly tied with human activity, one could guess the existence of similar correlation between its traffic demand and weather conditions. In this paper, we empirically quantify such effects. We find that the influence of precipitation depends on both time of the day as well as time of the year, and is maximal in the late afternoon over summer months. 

Doanload here

 

Socially Enabled Preference Learning from Implicit Feedback Data

Julien Delporte, Alexandros Karatzoglou, Tomasz Matuszczyk, Stéphane Canu

Socially Enabled Preference Learning from Implicit Feedback Data

Springer Berlin Heidelberg, Machine Learning and Knowledge Discovery in Databases, January 2013

 

Abstract

Intheageofinformationoverload,collaborativefilteringand recommender systems have become essential tools for content discovery. The advent of online social networks has added another approach to recommendation whereby the social network itself is used as a source for recommendations i.e. users are recommended items that are preferred by their friends.

In this paper we develop a new model-based recommendation method that merges collaborative and social approaches and utilizes implicit feedback and the social graph data. Employing factor models, we repre- sent each user profile as a mixture of his own and his friends’ profiles. This assumes and exploits “homophily” in the social network, a phenomenon that has been studied in the social sciences. We test our model on the Epinions data and on the Tuenti Places Recommendation data, a large-scale industry dataset, where it outperforms several state-of-the-art methods. 

Download here

Ambient timer–unobtrusively reminding users of upcoming tasks with ambient light

Heiko Müller, Anastasia Kazakova, Martin Pielot, Wilko Heuten, Susanne Boll

Ambient timer–unobtrusively reminding users of upcoming tasks with ambient light

Human-Computer Interaction–INTERACT 2013, January 2013

 

Abstract

Daily office work is often a mix of concentrated desktop work and scheduled meetings and appointments. However, constantly checking the clock and alarming popups interrupt the flow of creative work as they require the user's focused attention. We present Ambient Timer, an ambient light display designed to unobtrusively remind users of upcoming events. The light display - mounted around the monitor - is designed to slowly catch the user's attention and raise awareness for an upcoming event while not distracting her from the primary creative task such as writing a paper. Our experiment compared established reminder techniques such as checking the clock or using popups against Ambient Timer in two different designs. One of these designs produced a reminder in which the participants felt well informed on the progress of time and experienced a better "flow" of work than with traditional reminders. 

Download here

When assistance becomes dependence: characterizing the costs and inefficiencies of A-GPS

Narseo Vallina-Rodriguez, Jon Crowcroft, Alessandro Finamore, Yan Grunenberger, Konstantina Papagiannaki

When assistance becomes dependence: characterizing the costs and inefficiencies of A-GPS

ACM SIGMOBILE Mobile Computing and Communications Review (CCR), volume 17, issue 4, pages 3-14

 

Abstract

Location based services are a vital component of the mobile ecosystem. Among all the location technologies used behind the scenes, A-GPS (Assisted-GPS) is considered to be the most accurate. Unlike standalone GPS systems, A-GPS uses network support to speed up position fix. However, it can be a dangerous strategy due to varying cell conditions which may impair performance, sometimes potentially neglecting the expected benefits of the original design. We present the characterization of the accuracy, location acquisition speed, energy cost, and network dependency of the state of the art A-GPS receivers shipped in popular mobile devices. Our analysis is based on active measurements, an exhaustive on-device analysis, and cellular traffic traces processing. The results reveals a number of inefficiencies as a result of the strong dependence on the cellular network to obtain assisting data, implementation, and integration problems. 

Download Here

Research Digest

This magazine is a collection of the research projects and technologies that we have been working on over the past few years.

Research Newspaper

Research Newspaper is a compendium of news at Telefonica Digital Research Lab.

Publications

We follow an open research model in collaboration with universities and other research institutions and favor the dissemination of our work both through publications and technology transfer. We aim at first tier conferences. However, we also occasionally publish preliminary work in workshops or similar venues so please stay tuned!