Idioma:
Estás en InvestigaciónPublicaciones

Publicaciones_

These are our research publications.

Measuring Video QoE from Encrypted Traffic

Tracking and maintaining satisfactory QoE for video streaming services is becoming a greater challenge for mobile network operators than ever before. Downloading and watching video content on mobile devices is currently a growing trend among users, that is causing a demand for higher bandwidth and better provisioning throughout the network infrastructure. At the same time, popular demand for privacy has led many online streaming services to adopt end-to-end encryption, leaving providers with only a handful of indicators for identifying QoE issues.

In order to address these challenges, we propose a novel methodology for detecting video streaming QoE issues from encrypted traffic. We develop predictive models for detecting different levels of QoE degradation that is caused by three key influence factors, i.e. stalling, the average video quality and the quality variations. The models are then evaluated on the production network of a large scale mobile operator, where we show that despite encryption our methodology is able to accurately detect QoE problems with 72%-92% accuracy, while even higher performance is achieved when dealing with cleartext traffic.

Download here

Like a Pack of Wolves: Community Structure of Web Trackers

Web trackers are services that monitor user behavior on the web. The information they collect is ostensibly used for customization and targeted advertising. Due to rising privacy concerns, users have started to install browser plugins that prevent tracking of their web usage. Such plugins tend to address tracking activity by means of crowdsourced filters. While these tools have been relatively effective in protecting users from privacy violations, their crowdsourced nature requires significant human effort, and provide no fundamental understanding of how trackers operate. In this paper, we leverage the insight that funda- mental requirements for trackers’ success can be used as discriminating features for tracker detection. We begin by using traces from a mobile web proxy to model user browsing behavior as a graph. We then perform a transformation on the extracted graph that reveals very well-connected communities of trackers. Next, after discovering that trackers’ position in the transformed graph significantly differentiates them from “normal” vertices, we design an automated tracker detection mechanism using two simple algorithms. We find that both techniques for automated tracker detection are quite accurate (over 97%) and robust (less than 2% false positives). In conjunction with previous research, our findings can be used to build robust, fully automated online privacy preservation systems.

PDF

Is The Web HTTP/2 Yet?

Version 2 of the Hypertext Transfer Protocol (HTTP/2) was finalized in May 2015 as RFC 7540. It addresses well-known problems with HTTP/1.1 (e.g., head of line blocking and redundant headers) and introduces new features (e.g., server push and content priority). Though HTTP/2 is designed to be the future of the web, it remains unclear whether the web will—or should—hop on board. To shed light on this question, we built a measurement platform that monitors HTTP/2 adoption and performance across the Alexa top 1 million websites on a daily basis. Our system is live and up-to-date results can be viewed at [1]. In this paper, we report findings from an 11 month measurement campaign (November 2014 – October 2015). As of October 2015, we find 68,000 websites reporting HTTP/2 support, of which about 10,000 actually serve content with it. Unsurprisingly, popular sites are quicker to adopt HTTP/2 and 31% of the Alexa top 100 already support it. For the most part, websites do not change as they move from HTTP/1.1 to HTTP/2; current web development practices like inlining and do- main sharding are still present. Contrary to previous results, we find that these practices make HTTP/2 more resilient to losses and jitter. In all, we find that 80% of websites supporting HTTP/2 experience a decrease in page load time compared with HTTP/1.1 and the decrease grows in mobile networks. 

PDF

An Empirical Study of Android Alarm Usage for Application Scheduling

Android applications often rely on alarms to schedule back- ground tasks. Since Android KitKat, applications can opt-in for deferrable alarms, which allows the OS to perform alarm batching to reduce device awake time and increase the chances of network traffic being generated simultaneously by different applications. This mechanism can result in significant battery savings if appropriately adopted.

In this paper we perform a large scale study of the 22,695 most popular free applications in the Google Play Market to quantify whether expec- tations of more energy efficient background app execution are indeed warranted. We identify a significant chasm between the way application developers build their apps and Android’s attempt to address energy inefficiencies of background app execution. We find that close to half of the applications using alarms do not benefit from alarm batching capa- bilities. The reasons behind this is that (i) they tend to target Android SDKs lagging behind by more than 18 months, and (ii) they tend to feature third party libraries that are using non-deferrable alarms.

PDF

Identifying the Root Cause of Video Streaming Issues on Mobile Devices

Giorgos Dimopoulos, Ilias Leontiadis, Pere Barlet-Ros, Konstantina Papagiannaki, Peter Steenkiste

Proceedings of the 11th ACM International on Conference on emerging Networking Experiments and Technologies (CoNEXT), Heidelberg, Germany

Abstract:

Video streaming on mobile devices is prone to a multitude of faults and although well established video Quality of Experience (QoE) metrics such as stall frequency are a good indicator of the problems perceived by the user, they do not provide any insights about the nature of the problem nor where it has occurred. Quantifying the correlation between the aforementioned faults and the users' experience is a challenging task due the large number of variables and the numerous points-of-failure.
To address this problem, we developed a framework for diagnosing the root cause of mobile video QoE issues with the aid of machine learning. Our solution can take advantage of information collected at multiple vantage points between the video server and the mobile device to pinpoint the source of the problem. Moreover, our design works for di erent video types (e.g., bitrate, duration, ..) and contexts (e.g., wireless technology, encryption, ..) After training the system with a series of simulated faults in the lab, we analyzed the performance of each vantage point separately and when combined, in controlled and real world deployments. In both cases we nd that the involved entities can independently detect QoE issues and that only a few vantage points are required to identify a problem's location and nature.

 

Download here

Web Identity Translator: Behavioral Advertising and Identity Privacy with WIT

Fotios Papaodyssefs, Costas Iordanou, Jeremy Blackburn, Nikolaos Laoutaris, Dina Papagiannaki. 
HotNets-XIV Proceedings of the 14th ACM Workshop on Hot Topics in Networks, Nov 15

Abstract:

Online Behavioral Advertising (OBA) is an important revenue source for online publishers and content providers. However, the extensive user tracking required to enable OBA raises valid privacy concerns. Existing and proposed solutions either block all tracking, therefore breaking OBA entirely, or require significant changes on the current advertising infrastructure, making adoption hard. We propose Web Identity Translator (WIT), a new privacy service running as a proxy or middlebox. WIT stops the original tracking cookies from being set on the browser of users and instead substitutes them by private cookies it controls. Manipulating the mapping between tracking and private cookies WIT maintains permits transparent OBA to continue while simultaneously protecting the identity of users from attacks based on behavioral analysis of browsing patterns.

PDF

When Attention is not Scarce: Detecting Boredom from Mobile Phone Usage

Best paper award at Ubicomp 2015 

Boredom is a common human emotion which may lead to an active search for stimulation. People often turn to their mobile phones to seek that stimulation. In this paper, we tackle the challenge of automatically inferring boredom from mobile phone usage. In a two-week in-the-wild study, we collected over 40,000,000 usage logs and 4398 boredom self-reports of 54 mobile phone users. We show that a user-independent machine-learning model of boredom –leveragingfeaturesrelatedtorecencyofcommunication,usage intensity, time of day, and demographics– can infer boredom with an accuracy (AUCROC) of up to 82.9%. Results from a second field study with 16 participants suggest that people are more likely to engage with recommended content when they are bored, as inferred by our boredom-detection model. These findings enable boredom-triggered proactive recommender systems that attune their users’ level of attention and need for stimulation.

A Study of the Impact of DNS Resolvers on Performance Using a Causal Approach

Hadrien Hours, Ernst Biersack, Patrick Loiseau, Alessandro Finamore, Marco Mellia

A Study of the Impact of DNS Resolvers on Performance Using a Causal Approach

27th International Teletraffic Congress - ITC 27, Ghent, Belgium, September 8th-10th, 2015

 

Abstract:

 

For a user to access any resource on the Internet, it is necessary to first locate a server hosting the requested resource. The Domain Name System service (DNS) represents the first step in this process, translating a human readable name, the resource host name, into an IP address. With the expansion of Content Distribution Networks (CDNs), the DNS service has seen its importance increase. In a CDN, objects are replicated on different servers to decrease the distance from the client to a server hosting the object that needs to be accessed. The DNS service should improve user experience by directing its demand to the optimal CDN server. While most of the Internet Service Providers (ISPs) offer a DNS service to their customers, it is now common to see clients using a public DNS service instead. This choice may have an impact on Web browsing performance. In this paper we study the impact of choosing one DNS service instead of another and we compare the performance of a large European ISP DNS service with the one of a public DNS service, Google DNS. We propose a causal approach to expose the structural dependencies of the different parameters impacted by the DNS service used and we show how to model these dependencies with a Bayesian network. This model allows us to explain and quantify the benefits obtained by clients using their ISP DNS service and to propose a solution to further improve their performance. 

 

 

Download Here

I’ll be there for you : Quantifying Attentiveness towards Mobile Messaging

Social norm has it that people are expected to respond to mobile phone messages quickly. We investigate how attentive people really are and how timely they actually check and triage new messages throughout the day. By collecting more than 55,000 messages from 42 mobile phone users over the course of two weeks, we were able to predict people’s attentiveness through their mobile phone usage with close to 80% accuracy. We found that people were attentive to messages 12.1 hours a day, i.e. 84.8 hours per week, and provide statistical evidence how very short people’s inattentiveness lasts: in 75% of the cases mobile phone users return to their attentive state within 5 minutes. In this paper, we present a comprehensive analysis of attentiveness throughout each hour of the day and show that intelligent notification delivery services, such as bounded deferral, can assume that inattentiveness will be rare and subside quickly.

Boredom-Triggered Proactive Recommendations

We propose the concept of boredom-triggered proactive recommendations for mobile phones. Given that more and more services attempt to attract mobile phone users' attention via push notifications, attention in this context can be considered as an increasingly scarce resource. However, when people are bored, by definition, they seek stimuli and hence often turn to their mobile phones. We cite evidence from our most recent work that boredom can be inferred from patterns of mobile phone usage and that during inferred phases of boredom, people are more likely to engage with suggested content. Thus, using boredom as content-independent trigger might help to make proactive recommendations a more pleasant experience and, in consequence, more successful.

Multi-Context TLS (mcTLS): Enabling Secure In-Network Functionality in TLS

A significant fraction of Internet traffic is now encrypted and HTTPS will likely be the default in HTTP/2. However, Transport Layer Security (TLS), the standard protocol for encryption in the Internet, assumes that all functionality resides at the endpoints, making it impossible to use in-network services that optimize network resource usage, improve user experience, and protect clients and servers from security threats. Re-introducing in-network functionality into TLS sessions today is done through hacks, often weakening overall security.

In this paper we introduce multi-context TLS (mcTLS), which extends TLS to support middleboxes. mcTLS breaks the current "all-or-nothing" security model by allowing endpoints and content providers to explicitly introduce middleboxes in secure end-to-end sessions while controlling which parts of the data they can read or write.

We evaluate a prototype mcTLS implementation in both controlled and "live" experiments, showing that its benefits come at the cost of minimal overhead. More importantly, we show that mcTLS can be incrementally deployed and requires only small changes to client, server, and middlebox software.

The Power of Indirect Ties

While direct social ties have been intensely studied in the context of computer-mediated social networks, indirect ties (e.g., friends of friends) have seen little attention. Yet in real life, we often rely on friends of our friends for recommendations (of good doctors, good schools, or good babysitters), for introduction to a new job opportunity, and for many other occasional needs. In this work we attempt to 1) quantify the strength of indirect social ties, 2) validate the quantification, and 3) empirically demonstrate its usefulness for applications on two examples. We quantify social strength of indirect ties using a measure of the strength of the direct ties that connect two people and the intuition provided by the sociology literature. We evaluate the proposed metric by framing it as a link prediction problem and experimentally demonstrate that our metric accurately (up to 87.2%) predicts link’s formation. We show via data-driven experiments that the proposed metric for social strength can be used successfully for social applications. Specifically, we show that it can be used for predicting the effects of information diffusion with an accuracy of up to 0.753. We also show that it alleviates known problems in friend-to-friend storage systems by addressing two previously documented shortcomings: reduced set of storage candidates and data availability correlations.

From: Price Discrimination to Data Transparency

"Mobile Data for Public Health: Opportunities and Challenges"

The ubiquity of mobile phones worldwide is generating an unprecedented amount of human behavioral data both at an individual and aggregated levels. The study of this data as a rich source of information about human behavior emerged almost a decade ago. Since then, it has grown into a fertile area of research named computational social sciences with a wide variety of applications in different fields such as social networks, urban and transport planning, economic development, emergency relief, and, recently, public health. In this paper, we briefly describe the state of the art on using mobile phone data for public health, and present the opportunities and challenges that this kind of data presents for public health.

Impact of Carrier-Grade NAT on Web Browsing

Enrico Bocchi, Ali Safari Khatouni, Stefano Traverso, Alessandro Finamore, Valeria Di Gennaro, Marco Mellia, Maurizio Munafò, Dario Rossi

Impact of Carrier-Grade NAT on Web Browsing

6th International Workshop on TRaffic Analysis and Characterization (TRAC 2015), Dubrovnik, Croatia, 10 May 2015

 

Abstract:

 

Public IPv4 addresses are a scarce resource. While IPv6 adoption is lagging, Network Address Translation (NAT) technologies have been deployed over the last years to alleviate IPv4 exiguity and their high rental cost. In particular, Carrier- Grade NAT (CGN) is a well known solution to mask a whole ISP network behind a limited amount of public IP addresses, significantly reducing expenses.

Despite its economical benefits, CGN can introduce connectiv- ity issues which have sprouted a considerable effort in research, development and standardization. However, to the best of our knowledge, little effort has been dedicated to investigate the impact that CGN deployment may have on users’ traffic. This paper fills the gap. We leverage passive measurements from an ISP network deploying CGN and, by means of the Jensen- Shannon divergence, we contrast several performance metrics considering customers being offered public or private addresses. In particular, we gauge the impact of CGN presence on users’ web browsing experience.

Our results testify that CGN is a mature and stable technology as, if properly deployed, it does not harm users’ web browsing experience. Indeed, while our analysis lets emerge expected stochastic differences of certain indexes (e.g., the difference in the path hop count), the measurements related to the quality of users’ browsing are otherwise unperturbed. Interestingly, we also observe that CGN protects customers from unsolicited, often malicious, traffic. 

 

Download Here

 

Complexities in Internet Peering: Understanding the "Black" in the "Black Art

Peering in the Internet interdomain network has long been considered a “black art”, understood in-depth only by a select few peering experts while the majority of the network operator community only scratches the surface employing conventional rules-of-thumb to form peering links through ad hoc personal interactions. Why is peering considered a black art? What are the main sources of complexity in identifying potential peers, negotiating a stable peering relationship, and utility optimization through peering? How do contemporary operational practices approach these problems? In this work we address these questions for Tier-2 Network Service Providers. We identify and explore three major sources of complexity in peering: (a) inability to predict traffic flows prior to link formation (b) inability to predict economic utility owing to a complex transit and peering pricing structure (c) computational infeasibility of identifying the optimal set of peers because of the network structure. We show that framing optimal peer selection as a formal optimization problem and solving it is rendered infeasible by the nature of these problems. Our results for traffic complexity show that 15% NSPs lose some fraction of customer traffic after peering. Additionally, our results for economic complexity show that 15% NSPs lose utility after peering, approximately, 50% NSPs end up with higher cumulative costs with peering than transit only, and only 10% NSPs get paid-peering customers. 

Exploring Cyberbullying and Other Toxic Behavior in Team Competition Online Games

In this work we explore cyberbullying and other toxic behav- ior in team competition online games. Using a dataset of over 10 million player reports on 1.46 million toxic players along with corresponding crowdsourced decisions, we test several hypotheses drawn from theories explaining toxic behavior. Besides providing large-scale, empirical based understanding of toxic behavior, our work can be used as a basis for building systems to detect, prevent, and counter-act toxic behavior.

The Do Not Disturb Challenge – A Day Without Notifications

We report from the first holistic study of the effect of notifications across services, devices, and work and private life. We asked 12 people to disable notification alerts on all computing devices for 24 hours. Data was collected through open post-hoc interviews, which were analyzed by Open Coding. The participants showed very strong and polarized opinions towards the missing notification alerts. During work, some participants felt less stressed and more productive thanks to not being interrupted, however outside of the work context, some became stressed and anxious because they were afraid of missing important information and violating expectations of others. The only consistent findings across the participants was that none of them would keep notifications disabled altogether. Notifications may affect people negatively, but they are essential: cant live with them, can’t live without them.

The Collision of Online and Offline Expectations in Computer-Mediated Communication

Thanks to mobile phones, computer-mediated communication allows us to get in touch with people anywhere, anytime. We are no longer limited to being strictly online or offline. Hence, users can easily get caught between the expectations of people who are co-located in the offline/physical world, and the expectations of others who try to contact them online. However, existing systems today offer little help to effectively manage and balance these potentially colliding expectations from both worlds. We argue that one fruitful strategy to tackle this challenge is to share contextual cues not only with those trying to connect with us via the online world, as proposed in previous work, but also with people who are co-located with us in the offline world.

Enabling Social Applications via Decentralized Social Data Management

Macroscopic View of Malware in Home Networks

Alessandro Finamore, Sabyasachi Saha, Gaspar Modelo-Howard, Sung-Ju Lee, Enrico Bocchi, Luigi Grimaudo, Marco Mellia, Elena Baralis

Macroscopic View of Malware in Home Networks

12th Annual IEEE Consumer Communications & Networking Conference (IEEE CCNC'15), Las Vegas, NV, 9 January 2015

 

Abstract:

 

Malicious activities on the Web are increasingly threatening users in the Internet. Home networks are one of the prime targets of the attackers to host malwares, commonly exploited as a stepping stone to further launch a variety of attacks. Due to diversification, existing security solutions often fail to detect malicious activities which remain hidden and pose threats to users security and privacy. Characterizing behavioral patterns of known malwares can help to improve the classification accuracy of known threats. More important, since different malwares can share some commonalities, study the behavior of known malwares can enable the detection of previously unknown malicious activities. We pose the research question if it is possible to characterize such behavioral patterns analyzing the traffic from known infected clients. In this paper, we present our quest to discover such characterizations. Results show that commonalities arise but their identification may require some ingenuity. Also, more malicious activities can be found out from this analysis. 

Download Here

 

The Cost of the "S" in HTTPS

David Naylor, Alessandro Finamore, Ilias Leontiadis, Yan Grunenberger, Marco Mellia, Maurizio M. Munafò, Kostantina Papagiannaki, Peter Steenkiste.

The Cost of the "S" in HTTPS

Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies (CoNEXT), Sidney, Australia, ISBN: 978-1-4503-3279-8, December 2-5, 2014

 

Abstract

 

Increased user concern over security and privacy on the Internet has led to widespread adoption of HTTPS, the secure version of HTTP. HTTPS authenticates the communicating end points and provides confidentiality for the ensuing communication. However, as with any security solution, it does not come for free. HTTPS may introduce overhead in terms of infrastructure costs, communication latency, data usage, and energy consumption. Moreover, given the opaqueness of the encrypted communication, any innetwork value added services requiring visibility into application layer content, such as caches and virus scanners, become ineffective.

This paper attempts to shed some light on these costs. First, taking advantage of datasets collected from large ISPs, we examine the accelerating adoption of HTTPS over the last three years. Second, we quantify the direct and indirect costs of this evolution. Our results show that, indeed, security does not come for free. This work thus aims to stimulate discussion on technologies that can mitigate the costs of HTTPS while still protecting the user’s privacy. 

 

Download Here

Linguistic Analysis of Toxic Behavior in an Online Video Game

In this paper we explore the linguistic components of toxic behavior by using crowdsourced data from over 590 thousand cases of accused toxic players in a popular match-based competition game, League of Legends. We perform a series of linguistic analyses to gain a deeper understanding of the role communication plays in the expression of toxic behavior. We characterize linguistic behavior of toxic players and compare it with that of typical players in an online competition game. We also find empirical support describing how a player transitions from typical to toxic behavior. Our findings can be helpful to automatically detect and warn players who may become toxic and thus insulate potential victims from toxic playing in advance.

Large-Scale Network Traffic Monitoring with DBStream, a System for Rolling Big Data Analysis

Arian Bar, Alessandro Finamore, Pedro Casas, Luckasz Golab, Marco Mellia

Large-Scale Network Traffic Monitoring with DBStream, a System for Rolling Big Data Analysis

IEEE International Conference on Big Data 2014 - IEEE BigData 2014, Washington DC,, 27 October 2014

 

Abstract

The complexity of the Internet has rapidly increased, making it more important and challenging to design scalable network monitoring tools. Network monitoring typically requires rolling data analysis, i.e., continuously and incrementally up- dating (rolling-over) various reports and statistics over high- volume data streams. In this paper, we describe DBStream, which is an SQL-based system that explicitly supports incremental queries for rolling data analysis. We also present a performance comparison of DBStream with a parallel data processing engine (Spark), showing that, in some scenarios, a single DBStream node can outperform a cluster of ten Spark nodes on rolling network monitoring workloads. Although our performance evaluation is based on network monitoring data, our results can be generalized to other big data problems with high volume and velocity. 

Download Here

SLIDES: Crowd-assisted Search for Price Discrimination

In this talk I'll go over some initial results from our measurement study aiming to identify signs of discriminatory practices in e-commerce. I'll present the tool we used to collect our data, analyze an initial dataset from some 300+ beta testers of the tool, and focus on instances of dynamic pricing observed in conjunction to different locations, retailers, and products. I'll later talk about the important challenges remaining for going from this initial study to a much more concrete and thorough understanding of the issue at a scale that is more representative of the actual practices of retailers at "Internet-scale". Finally, I'll try to connect this particular measurement study with other related important questions remaining unanswered in the general area of privacy economics, Internet advertising and e-commerce. Most of the material that I will present can be found in the following two articles:
 
J. Mikians, L. Gyarmati, V. Erramilli, N. Laoutaris, “Crowd-assisted Search for Price Discrimination in E-Commerce: First results,”  ACM CoNEXT'13.
 
J. Mikians, L. Gyarmati, V. Erramilli, N. Laoutaris, “Detecting price and search discrimination on the Internet,” in Proc. of ACM HotNets'12.

Coverage, Redundancy and Size-awareness in Genre Diversity for Recommender Systems

There is increasing awareness in the Recommender Systems
field that diversity is a key property that enhances the usefulness of recommendations. Genre information can serve as a means to measure and enhance the diversity of recommendations and is readily available in domains such as movies, music or books. In this work we propose a new Binomial framework for defining genre diversity in recommender systems that takes into account three key properties: genre
coverage genre redundancy and recommendation list
size-awareness.
 
We show that methods previously proposed for measuring
and enhancing recommendation diversity –including those
adapted from search result diversification– fail to address
adequately these three properties. We also propose an ef-
ficient greedy optimization technique to optimize Binomial
diversity. Experiments with the Netflix dataset show the
properties of our framework and comparison with state of
the art methods.

Question Recommendation for Collaborative Question Answering Systems with RankSLDA

Collaborative question answering (CQA) communities rely on user participation for their success. This paper presents a supervised Bayesian approach to model expertise in on-line CQA communities with application to question recommendation, aimed at reducing waiting times for responses and avoiding question starvation. We propose a novel algorithm called RankSLDA which extends the supervised Latent Dirichlet Allocation model by considering a learning-to-rank paradigm. This allows us to exploit the inherent collaborative effects that are present in CQA communities where users tend to answer questions in their topics of expertise. Users can thus be modeled on the basis of the topics in which they demonstrate expertise. In the supervised stage of the method we model the pairwise order of expertise of users on a given question. We compare RankSLDA
against several alternative methods on data from the Cross Validate community, part of the Stack Exchange network. RankSLDA outperforms all alternative methods by a signiffcant margin

The Power of Indirect Ties in Friend-to-Friend Storage Systems

CARS2: Learning Context-aware Representations for Context-aware Recommendations

Rich contextual information is typically available in many
recommendation domains allowing recommender systems to
model the subtle effects of context on preferences. Most contextual models assume that the context shares the same la-
tent space with the users and items. In this work we propose
CARS2, a novel approach for learning context-aware representations for context-aware recommendations. We show
that the context-aware representations can be learned using an appropriate model that aims to represent the type
of interactions between context variables, users and items.
We adapt the CARS2 algorithms to explicit feedback data by using a quadratic loss function for rating prediction, and
to implicit feedback data by using a pairwise and a listwise
ranking loss functions for top-N recommendations. By using stochastic gradient descent for parameter estimation we
ensure scalability. Experimental evaluation shows that our CARS2
models achieve competitive recommendation performance, compared to several state-of-the-art approaches.

Who to Blame when YouTube is not Working? Detecting Anomalies in CDN-Provisioned Services,

Alessandro D'Alconzo, Pedro Casas, Pierdomenico Fiadino, Arian Bar, Alessandro Finamore

Who to Blame when YouTube is not Working? Detecting Anomalies in CDN-Provisioned Services

TRaffic Analysis and Characterization, TRAC, Nicosia, Cyprus, August 4-8, 2014

 

Abstract

 

Internet-scale services like YouTube are provisioned by large Content Delivery Networks (CDNs), which push content as close as possible to the end-users to improve their Quality of Experience (QoE) and to pursue their own optimization goals. Adopting space and time variant traffic delivery policies, CDNs serve users’ requests from multiple servers/caches at different physical locations and different times. CDNs traffic distribution policies can have a relevant impact on the traffic routed through the Internet Service Provider (ISP), as well as unexpected negative effects on the end-user QoE. In the event of poor QoE due to faulty CDN server selection, a major problem for the ISP is to avoid being blamed by its customers. In this paper we show a real case study in which Google CDN server selection policies negatively impact the QoE of the customers of a major European ISP watching YouTube. We argue that it is extremely important for the ISP to rapidly and automatically detect such events to increase its visibility on the overall operation of the network, as well as to promptly answer possible customer complaints. We therefore present an Anomaly Detection (AD) system for detecting unexpected cache-selection changes in the traffic delivered by CDNs. The proposed algorithm improves over traditional AD approaches by analyzing the complete probability distribution of the monitored features, as well as by self-adapting its functioning to dynamic environments, providing better detection capabilities. 

 

Download Here

Gaussian Process Factorization Machines for Context-aware Recommendations

Context-aware recommendation (CAR) can lead to significant improvements in the relevance of the recommended items by modeling the nuanced ways in which context influences preferences. The dominant approach in context-aware recommendation has been the multidimensional latent factors approach in which users, items, and context variables are represented as latent features in a low-dimensional space.

An interaction between a user, item, and a context variable is typically modeled as some linear combination of their latent features. However, given the many possible types of interactions between user, items and contextual variables, it may seem unrealistic to restrict the interactions among them to linearity.

To address this limitation, we develop a novel and powerful non-linear probabilistic algorithm for context-aware recommendation using Gaussian processes. The method which we call Gaussian Process Factorization Machines (GPFM) is applicable to both the explicit feedback setting (e.g. numerical ratings as in the Netflix dataset) and the implicit feedback setting (i.e. purchases, clicks). We derive stochastic gradient descent optimization to allow scalability of the

model. We test GPFM on five different benchmark contextual datasets. Experimental results demonstrate that

GPFM outperforms state-of-the-art context-aware recommendation methods

Sentiment retrieval on web reviews using spontaneous natural speech

 J.C. Pereira, J. Luque, X. Anguera,¨Sentiment retrieval on web reviews using spontaneous natural speech¨ in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing,(ICASSP’14), May 2014

This paper addresses the problem of document retrieval based on sentiment polarity criteria. A query based on natural spontaneous speech, expressing an opinion about a certain topic, is used to search a repository of documents containing favorable or unfavorable opinions. The goal is to retrieve documents whose opinions more closely resemble the one in the query. A semantic system based on the speech transcripts is augmented with information from full-length text articles. Posterior probabilities extracted from the article are used to regularize their transcription counterparts. This paper makes three important contributions. First, we introduce a framework for polarity analysis of sentiments that can accommodate combinations of different modalities, while maintaining the flexibility of unimodal systems, i.e. capable of dealing with the absence of any modality. Second, we show that it is possible to improve average precision on speech transcriptions’ sentiment retrieval by means of regularization. Third, we demonstrate the strength and generalization of our approach by training regularizers on one dataset, while performing sentiment retrieval experiments, with substantial gains, on a collection of YouTube clips. 

 

Download here

Inferring social relationships in a phone call from a single party’s speech

S.H. Yella, X. Anguera, J. Luque, “Inferring social relationships in a phone call from a single party’s speech“, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’14), May 2014

People usually speak differently depending on who they talk to. Based on this hypothesis, in this paper we propose an automatic method to detect the social relationship between two people based solely on a set of acoustic and conversational characteristics. We argue that changes in these features of an individual reflect her social relationship with the other person. To infer relationship we only require the speech of one of the conversation partners and the interaction patterns between both speakers. We validate the proposed system using a real-life telephone database with calls made by several speakers to close family members and to their partners. We trained a classifier using a boosting algorithm on a set of conversational and acoustic features and use it to classify calls according to the social relationship between both speakers. Tests performed on models trained on single speaker’s data show that for most people such prediction is feasible. We also show that these characteristics generalize quite well across speakers, achieving around 75% accuracy when both sets of features are combined. 

Download here

Cheating in Online Games: A Social Network Perspective

Online gaming is a multi-billion dollar industry that entertains a large, global population. One unfortunate phenomenon, however, poisons the competition and spoils the fun: cheating. The costs of cheating span from industry-supported expenditures to detect and limit it, to victims’ monetary losses due to cyber crime.

This paper studies cheaters in the Steam Community, an online social network built on top of the world’s dominant digital game delivery platform. We collected information about more than 12 million gamers connected in a global social network, of which more than 700 thousand have their profiles flagged as cheaters. We also observed timing information of the cheater flags, as well as the dynamics of the cheaters’ social neighborhoods.

We discovered that cheaters are well embedded in the social and interaction networks: their network position is largely indistinguishable from that of fair players. Moreover, we noticed that the number of cheaters is not correlated with the geographical, real-world population density, or with the local popularity of the Steam Community. Also, we observed a social penalty involved with being labeled as a cheater: cheaters lose friends immediately after the cheating label is publicly applied.

Most importantly, we observed that cheating behavior spreads through a social mechanism: the number of cheater friends of a fair player is correlated with the likelihood of her becoming a cheater in the future. This allows us to propose ideas for limiting cheating contagion.

Didn’t you see my message? predicting attentiveness to mobile instant messages

Martin Pielot, Rodrigo de Oliveira, Haewoon Kwak, Nuria Oliver

"Didn’t you see my message? predicting attentiveness to mobile instant messages"

Proc. CHI, April 2014

 

Abstract

Mobile instant messaging (e.g., via SMS or WhatsApp) often goes along with an expectation of high attentiveness, i.e., that the receiver will notice and read the message within a few minutes. Hence, existing instant messaging services for mobile phones share indicators of availability, such as the last time the user has been online. However, in this paper we not only provide ev- idence that these cues create social pressure, but that they are also weak predictors of attentiveness. As remedy, we propose to share a machine-computed prediction of whether the user will view a message within the next few minutes or not. For two weeks, we collected behavioral data from 24 users of mobile instant messaging services. By the means of machine-learning techniques, we identified that simple features extracted from the phone, such as the user’s interaction with the notification center, the screen activity, the proximity sensor, and the ringer mode, are strong predictors of how quickly the user will attend to the messages. With seven automatically selected features our model predicts whether a phone user will view a message within a few minutes with 70.6% accuracy and a precision for fast attendance of 81.2%. 

 

Dowload here

STFU NOOB!: Predicting Crowdsourced Decisions on Toxic Behavior in Online Games

One problem facing players of competitive games is negative, or toxic, behavior. League of Legends, the largest eSport game, uses a crowdsourcing platform called the Tribunal to judge whether a reported toxic player should be punished or not. The Tribunal is a two stage system requiring reports from those players that directly observe toxic behavior, and human experts that review aggregated reports. While this system has successfully dealt with the vague nature of toxic behavior by majority rules based on many votes, it naturally requires tremendous cost, time, and human efforts.

In this paper, we propose a supervised learning approach for predicting crowdsourced decisions on toxic behavior with largescale labeled data collections; over 10 million user reports involved in 1.46 million toxic players and corresponding crowdsourced decisions. Our result shows good performance in detecting overwhelmingly majority cases and predicting crowdsourced decisions on them. We demonstrate good portability of our classifier across regions. Finally, we estimate the practical implications of our approach, potential cost savings and victim protection.

Social-Aware Replication in Geo-Diverse Online Systems

Stefano Traverso, Kévin Huguenin, Ionut Trestian, Vijay Erramilli, Nikolaos Laoutaris, Konstantina Papagiannaki

Social-Aware Replication in Geo-Diverse Online Systems

IEEE Transactions on Parallel and Distributed Systems, March 2014

 

Abstract

Distributing long-tail content is a difficult task due to the low amortization of bandwidth transfer costs as such content has limited number of views. Two recent trends are making this problem harder. First, the increasing popularity of user-generated content and online social networks create and reinforce such popularity distributions. Second, the recent trend of geo-replicating content across multiple points of presence spread around the world, done for improving quality of experience (QoE) for users. In this paper, we analyze and explore the tradeoff involving the “freshness” of the information available to the users and WAN bandwidth costs, and we propose ways to reduce the latter through smart update propagation scheduling, by leveraging on the knowledge of the mapping between social relationships and geographic location, the timing regularities and time differences in end user activity. We first assess the potential of our approach by implementing a simple social-aware scheduling algorithm that operates under bandwidth budget constraints and by quantifying its benefits through a trace-driven analysis. We show that it can reduce WAN traffic by up to 55% compared to an immediate update of all replicas, with a minimal effect on information freshness and latency. Second, we build TailGate, a practical system that implements our social-aware scheduling approach, which distributes on the fly long-tail content across PoPs at reduced bandwidth costs by flattening the traffic. We evaluate TailGate by using traces from an OSN and show that it can decrease WAN bandwidth costs by as much as 80% and improve QoE. We deploy TailGate on PlanetLab and show that even in the case when imprecise social information is available, it can still decrease by a factor of 2 the latency for accessing long-tail YouTube videos. 

 

Dowload here

Language Indipendent Search in MediaEval's Spoken Web Search Task

Florian Metze, Xavier Anguera, Etienne Barnard, Marelie Daviel, Guillaume Gravier

"Language Independent Search in MediaEval's Spoken Web Search Task"

Computer Speech & Language, January 2014

Abstract

In this paper, we describe several approaches to language-independent spoken term detection and compare their performance on a common task, namely “Spoken Web Search”. The goal of this part of the MediaEval initiative is to perform low-resource language-independent audio search using audio as input. The data was taken from “spoken web” material collected over mobile phone connections by IBM India as well as from the LWAZI corpus of African languages. As part of the 2011 and 2012 MediaEval benchmark campaigns, several diverse systems have been implemented by independent teams, and submitted to the “Spoken Web Search” evaluation. This paper presents the 2011 and 2012 results, and compares the relative merits and weaknesses of approaches developed by participants, providing analysis and directions for future research, in order to improve voice access to spoken information in low resource settings. 

Download here

The Influence of Indirect Ties on Social Network Dynamics

While direct social ties have been intensely studied in the context of computer-mediated social networks, indirect ties (e.g., friends of friends) have seen less attention. Yet in real life, we often rely on friends of our friends for recommendations (of doctors, schools, or babysitters), for introduction to a new job opportunity, and for many other occasional needs. In this work we empirically study the predictive power of indirect ties in two dynamic processes in social networks: new link formation and information diffusion. We not only verify the predictive power of indirect ties in new link formation but also show that this power is effective over longer social distance. Moreover, we show that the strength of an indirect tie positively correlates to the speed of forming a new link between the two end users of the indirect tie. Finally, we show that the strength of indirect ties can serve as a predictor for diffusion paths in social networks.

RILAnalyzer: a Comprehensive 3G Monitor On Your Phone

Narseo Vallina-Rodriguez, Andrius Aucinas, Mario Almeida, Yan Grunenberger, Konstantina Papagiannaki, Jon Crowcroft

RILAnalyzer: a Comprehensive 3G Monitor On Your Phone

ACM Internation Measurement Conference, October 2013

 

Abstract

The popularity of smartphones, cloud computing, and the app store model have led to cellular networks being used in a completely different way than what they were designed for. As a consequence, mobile applications impose new challenges in the design and efficient configuration of constrained networks to maximize application’s performance. Such dif- ficulties are largely caused by the lack of cross-layer understanding of interactions between different entities - applications, devices, the network and its management plane. In this paper, we describe RILAnalyzer, an open-source tool that provides mechanisms to perform network analysis from within a mobile device. RILAnalyzer is capable of recording low-level radio information and accurate cellular network control-plane data, as well as user-plane data. We demonstrate how such data can be used to identify previously overlooked issues. Through a small user study across four cellular network providers in two European countries we infer how different network configurations are in reality and explore how such configurations interact with application logic, causing network and energy overheads. 

Download here

Challenges of keyword-based location disclosure

Christopher J Riederer, Augustin Chaintreau, Jacob Cahan, Vijay Erramilli

Challenges of keyword-based location disclosure

Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society, November 2013

 

GAPfm: Optimal top-n recommendations for graded relevance domains

Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Alan Hanjalic

GAPfm: Optimal top-n recommendations for graded relevance domains

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, October 2013

 

Abstract

 

Recommender systems are frequently used in domains in which users express their preferences in the form of graded judgments, such as ratings. If accurate top-N recommenda- tion lists are to be produced for such graded relevance do- mains, it is critical to generate a ranked list of recommended items directly rather than predicting ratings. Current tech- niques choose one of two sub-optimal approaches: either they optimize for a binary metric such as Average Precision, which discards information on relevance grades, or they opti- mize for Normalized Discounted Cumulative Gain (NDCG), which ignores the dependence of an item’s contribution on the relevance of more highly ranked items.

In this paper, we address the shortcomings of existing approaches by proposing the Graded Average Precision factor model (GAPfm), a latent factor model that is particularly suited to the problem of top-N recommendation in domains with graded relevance data. The model optimizes for Graded Average Precision, a metric that has been proposed recently for assessing the quality of ranked results list for graded relevance. GAPfm learns a latent factor model by directly optimizing a smoothed approximation of GAP. GAPfm’s advantages are twofold: it maintains full information about graded relevance and also addresses the limita- tions of models that optimize NDCG. Experimental results show that GAPfm achieves substantial improvements on the top-N recommendation task, compared to several state-of- the-art approaches. In order to ensure that GAPfm is able to scale to very large data sets, we propose a fast learning algorithm that uses an adaptive item selection strategy. A final experiment shows that GAPfm is useful not only for generating recommendation lists, but also for ranking a given list of rated items. 

 

Dowload here

Follow the money: understanding economics of online aggregation and advertising

Phillipa Gill, Vijay Erramilli, Augustin Chaintreau, Balachander Krishnamurthy, Konstantina Papagiannaki, Pablo Rodriguez

Follow the money: understanding economics of online aggregation and advertising

ACM Proceedings of the 2013 conference on Internet measurement conference, October 2013

 

Abstract

The large-scale collection and exploitation of personal infor- mation to drive targeted online advertisements has raised privacy concerns. As a step towards understanding these concerns, we study the relationship between how much information is collected and how valuable it is for advertising. We use HTTP traces consisting of millions of users to aid our study and also present the first comparative study between aggregators. We develop a simple model that captures the various parameters of today’s advertising revenues, whose values are estimated via the traces. Our results show that per aggregator revenue is skewed (5% accounting for 90% of revenues), while the contribution of users to advertising revenue is much less skewed (20% accounting for 80% of revenue). Google is dominant in terms of revenue and reach (presence on 80% of publishers). We also show that if all 5% of the top users in terms of revenue were to install privacy protection, with no corresponding reaction from the publishers, then the revenue can drop by 30%. 

Dowload here

 

The spoken web search task

Xavier Anguera, Florian Metze, Andi Buzo, Igor Szoke, Luis Javier Rodriguez-Fuentes

The spoken web search task

MediaEval 2013 Workshop, October 2013, Barcelona, Spain

 

Abstract

In this paper, we describe the “Spoken Web Search” Task, which is being held as part of the 2013 MediaEval campaign. The purpose of this task is to perform audio search in multiple languages and acoustic conditions, with very few resources being available for each individual language. This year the data contains audio from nine different languages and is much bigger in size than in previous years, mimicking realistic low/zero-resource settings. 

Dowloan here

xCLiMF: optimizing expected reciprocal rank for data with multiple levels of relevance

Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Alan Hanjalic

xCLiMF: optimizing expected reciprocal rank for data with multiple levels of relevance

Proceedings of the 7th ACM conference on Recommender systems, October 2013

 

Abstract

Extended Collaborative Less-is-More Filtering xCLiMF is a learning to rank model for collaborative filtering that is specifically designed for use with data where information on the level of relevance of the recommendations exists, e.g. through ratings. xCLiMF can be seen as a generalization of the Collaborative Less-is-More Filtering (CLiMF) method that was proposed for top-N recommendations using binary relevance (implicit feedback) data. The key contribution of the xCLiMF algorithm is that it builds a recommendation model by optimizing Expected Reciprocal Rank, an evaluation metric that generalizes reciprocal rank in order to incorporate user feedback with multiple levels of relevance. Experimental results on real-world datasets show the effectiveness of xCLiMF, and also demonstrate its advantage over CLiMF when more then two levels of relevance exist in the data. 

Dowload here

xDGP: A Dynamic Graph Processing System with Adaptive Partitioning

Luis Vaquero, Félix Cuadrado, Dionysios Logothetis, Claudio Martella

xDGP: A Dynamic Graph Processing System with Adaptive Partitioning

ACM, Proceedings of the 4th annual Symposium on Cloud Computing, Osctober 2013

Dowload here

Scalable lineage capture for debugging DISC analytics

Dionysios Logothetis, Soumyarupa De, Kenneth Yocum

Scalable lineage capture for debugging DISC analytics

ACM, Proceedings of the 4th annual Symposium on Cloud Computing, October 2013

Dowload here

Last call for the buffet: economics of cellular networks

Jeremy Blackburn, Rade Stanojevic, Vijay Erramilli, Adriana Iamnitchi, Konstantina Papagiannaki

Last call for the buffet: economics of cellular networks

ACM, Proceedings of the 19th annual international conference on Mobile computing & networking, September 2013

 

Abstract

Voice and data traffic growth over the last several years has become a major challenge for cellular operators with a direct impact on revenues, infrastructure investments, and end-user performance. The economics of these operators depend on various incentives used to attract users in the form of unlimited, buffet-like voice/sms/data packages. However, our understanding of the effects of user behavior under these offerings on operator revenues/costs remains poor. Using two years of detailed usage information of 1 million users across three services, voice, sms and data, combined with payment and cost information, we study how user behavior affects the economics of cellular operators. We discover that around 20% of the users consume more resources than what they pay for and hence are non-profitable. In addition to the individual user behavior, we study how the user interactions in the call graph affect the operator’s revenues and cost, drawing on tools from social network analysis. We develop a framework that incorporates both the individual and social user behavior for studying how volume caps influence the revenues and the traffic costs. Using this framework we empirically show that volume caps can increase the difference between the revenues and the traffic costs of the studied operator by a factor of 2, while affecting only 16% of the existing user base. 

Dowload here

Characterizing Home Network Performance Problems

Srikanth Sundaresan, Nick Feamster, Renata Teixeira, Yan Grunenberger, Dina Papagiannaki, Dave Levin

Characterizing Home Network Performance Problems

hal-00864852, version 1, September 2013

 

Abstract

We design, develop, validate, and deploy WTF (Where’s The Fault?), a system that determines whether a performance problem in a home network lies with the ISP or inside the home network. WTF uses four independent maximum likelihood detectors to detect both access link bottlenecks and wireless network pathologies with high detection rates and low false positive rates; we use extensive controlled exper- iments to determine the appropriate thresholds for each pa- rameter that we measure. We implemented WTF as custom firmware that runs in an off-the-shelf home router and deployed it in 64 home networks across 15 countries. The real-world deployment sheds light on common pathologies that occur in home networks. We find that wireless bottlenecks are significantly more common than access link bot- tlenecks, that the 5 GHz spectrum consistently outperforms the 2.4 GHz spectrum, that many homes experience high TCP round-trip latencies between wireless clients and the ac- cess point, and that performance can vary dramatically across wireless devices, even within a single home network. 

 

Dowload here

Is there a case for mobile phone content pre-staging?

Alessandro Finamore, Marco Mellia, Zafar Gilani, Konstantina Papagiannaki, Vijay Erramilli, Yan Grunenberger

Is there a case for mobile phone content pre-staging?

Proceedings of the ninth ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT), September 2013

 

Abstract:

Content caching is a fundamental building block of the Inter- net. Caches are widely deployed at network edges to improve performance for end-users, and to reduce load on web servers and the backbone network. Considering mobile 3G/4G net- works, however, the bottleneck is at the access link, where bandwidth is shared among all mobile terminals. As such, per-user capacity cannot grow to cope with the traffic demand. Unfortunately, caching policies would not reduce the load on the wireless link which would have to carry multiple copies of the same object that is being downloaded by multiple mobile terminals sharing the same access link.

In this paper we investigate if it is worth to push the caching paradigm even farther. We hypothesize a system in which mobile terminals implement a local cache, where popular content can be pushed/pre-staged. This exploits the peculiar broadcast capability of the wireless channels to replicate content “for free” on all terminals, saving the cost of transmitting multiple copies of those popular objects. Relying on a large data set collected from a European mobile carrier, we analyse the content popularity characteristics of mobile traffic, and quantify the benefit that the push-to- mobile system would produce. We found that content pre-staging, by proactively and periodically broadcasting “bundles” of popular objects to devices, allows to both greatly i) improve users’ performance and ii) reduce up to 20% (40%) the downloaded volume (number of requests) in optimistic scenarios with a bundle of 100 MB. However, some technical constraints and content characteristics could question the actual gain such system would reach in practice. 

Dowload here

Staying online while mobile: The hidden costs

Andrius Aucinas, Narseo Vallina-Rodriguez, Yan Grunenberger, Vijay Erramilli, Konstantina Papagiannaki, Jon Crowcroft, J Wetherall

Staying online while mobile: The hidden costs

ACM CoNEXT, September 2013
 
Abstract

Mobile phones in the 3G/4G era enable us to stay connected not only to the voice network, but also to online services like social networks. In this paper, we study the energy and network costs of mobile applications that provide continuous online presence (e.g. WhatsApp, Facebook, Skype). By combining measurements taken on the mobile and the cellular access network, we reveal a detailed picture of the mechanisms selected to implement online presence, along with their effect on handset energy consumption and network signaling traffic. We are surprised to find that simply having idle online presence apps on a mobile (that maintain connectivity in the background, with no user interaction) can drain the handset battery nine times more quickly. This high cost is partly due to online presence apps that are excessively “chatty”, in particular when their design philosophy stems from a similar desktop version. However, we also find that the cost of background app traffic is disproportionately large because of cross-layer interactions in which the traffic unintentionally triggers the promotion of cellular network states. Our experiments show that both of these effects can be overcome with careful implementation. We posit that a two-way push notification system, with messages being sent at a low (regular) frequency and low volume by a network-aware sender, can alleviate many of the costs. 

Download here

Network monitoring architecture based on home gateways

Claudio Casetti, Yan Grunenberger, Frank Den Hartog, Anukool Lakhina, Henrik Lundgren, Marco Milanesio, Anna-Kaisa Pietilainen, Renata Teixeira, Shuang Zhang

Network monitoring architecture based on home gateways

Future Network and MobileSummit 2013 Conference Proceedings, September 2013

 

Abstract

The “Future Internet Gateway-based Architecture of Residential netwOrks (FIGARO)” project proposes to tackle the new challenges arising from the shift of the Internet use from technology centric to user/content centric with a novel network architecture centered on the residential gateways. Many use cases for the FIGARO architecture such as home automation, distributed content management, content delivery optimizations, network performance monitoring and troubleshooting require advanced network monitoring functionality on the residential gateway. In this paper, we discuss the requirements and design of the FIGARO gateway-centric network monitoring architecture. 

Dowload here

What's up with whatsapp?: comparing mobile instant messaging behaviors with traditional SMS

Karen Church, Rodrigo de Oliveira

What's up with whatsapp?: comparing mobile instant messaging behaviors with traditional SMS

ACM, Proceedings of the 15th international conference on Human-computer interaction with mobile devices and services, August 2013

Dowload here

Peripheral vibro-tactile displays

Martin Pielot, Rodrigo de Oliveira

Peripheral vibro-tactile displays

ACM, Proceedings of the 15th international conference on Human-computer interaction with mobile devices and services, August 2013

 

Abstract:

We report from a study exploring the boundaries of the pe- ripheral perception of vibro-tactile stimuli. For three days, we exposed 15 subjects to a continual vibration pattern that was created by a mobile device worn in their trouser pocket. In order to guarantee that the stimuli would not require the sub- jects focal attention, the vibration pattern was tested and refined to minimise its obtrusiveness, and during the study, the participants adjusted its intensity to just above their personal detection threshold. At random times, the vibration stopped and participants had to acknowledge these events as soon as they noticed them. Only 6.5% of the events were acknowledged fast enough to assume that the cue had been on the focus of the participants’ attention. The majority of events were answered between 1 and 10 minutes, which indicates that the participants were aware of the cue without focussing on it. In addition, participants reported not to be annoyed by the signal in 94.4% of the events. These results provide evidence that vibration patterns can form non-annoying, lightweight information displays, which can be consumed at the periphery of a users attention. 

Dowload here

Information Retrieval-based Dynamic Time Warping

Xavier Anguera

Information Retrieval-based Dynamic Time Warping

Proc. Interspeech, Lyon, France, August 2013

 

Abstract

In this paper we introduce a novel dynamic programming algorithm called Information Retrieval-based Dynamic Time Warp- ing (IR-DTW) used to find non-linearly matching subsequences between two time series where matching start and end points are not known a priori. In this paper our algorithm is applied for audio matching within the query by example (QbE) spoken term detection (STD) task, although it is applicable to many other problems. The main advantages of the proposed algorithm in comparison to similar approaches are twofold. On the one hand, IR-DTW requires a much smaller memory footprint than standard Dynamic Time Warping (DTW) approaches. On the other hand, it allows for the application of indexing tech- niques to the search collection for increased matching speed, which makes IR-DTW suitable for application in large scale implementations. We show through preliminary experimentation with a QbE-STD task that the memory footprint is greatly reduced in comparison to a baseline subsequence-DTW (S-DTW) implementation and that its matching accuracy is much better than that of pure diagonal matching and just slightly worse than that of S-DTW. 

Dowload here

CLiMF: collaborative less-is-more filtering

Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Nuria Oliver, Alan Hanjalic

CLiMF: collaborative less-is-more filtering

AAAI Press, Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, August 2013

 

Abstract

In this paper we tackle the problem of recommendation in the scenarios with binary relevance data, when only a few (k) items are recommended to individual users. Past work on Collaborative Filtering (CF) has either not addressed the ranking problem for binary relevance datasets, or not specifically focused on improving top-k recommendations. To solve the problem we propose a new CF approach, Collaborative Less-is-More Filter- ing (CLiMF). In CLiMF the model parameters are learned by directly maximizing the Mean Reciprocal Rank (MRR), which is a well-known in- formation retrieval metric for capturing the performance of top-k recommendations. We achieve linear computational complexity by introducing a lower bound of the smoothed reciprocal rank metric. Experiments on two social network datasets show that CLiMF significantly outperforms a naive baseline and two state-of-the-art CF methods. 

Download here

Using Tuangou to Reduce IP Transit Costs

Ignacio Castro, Rade Stanojevic, Sergey Gorinsky

"Using Tuangou to Reduce IP Transit Costs"

IEEE/ACM TRANSACTIONS ON NETWORKING, July 2013

 

Abstract

A majority of Internet service providers (ISPs) support connectivity to the entire Internet by transiting their traffic via other providers. Although the transit prices per megabit per second (Mbps) decline steadily, the overall transit costs of these ISPs remain high or even increase due to the traffic growth. The discontent of the ISPs with the high transit costs has yielded notable innovations such as peering, content distribution networks, multicast, and peer-to-peer localization. While the above solutions tackle the problem by reducing the transit traffic, this paper explores a novel approach that reduces the transit costs without altering the traffic. In the proposed Cooperative IP Transit (CIPT), multiple ISPs cooperate to jointly purchase Internet Protocol (IP) transit in bulk. The aggregate transit costs decrease due to the economies-of-scale effect of typical subadditive pricing as well as burstable billing: Not all ISPs transit their peak traffic during the same period. To distribute the aggregate savings among the CIPT partners, we propose Shapley-value sharing of the CIPT transit costs. Using public data about IP traffic and transit prices, we quantitatively evaluate CIPT and show that significant savings can be achieved, both in relative and absolute terms. We also discuss the organizational embodiment, relationship with transit providers, traffic confidentiality, and other aspects of CIPT. 

Download here

Memory efficient subsequence DTW for Query-by-Example spoken term detection

Xavier Anguera, Miquel Ferrarons

Memory efficient subsequence DTW for Query-by-Example spoken term detection

Multimedia and Expo (ICME), 2013 IEEE International Conference on, July 2013
 
Abstract

In this paper we propose a fast and memory efficient Dynamic Time Warping (MES-DTW) algorithm for the task of Query-by- Example Spoken Term Detection (QbE-STD). The proposed algorithm is based on the subsequence-DTW (S-DTW) algorithm, which allows the search for small query sequences of feature vectors within a much longer reference sequence by considering fixed start-end points in the query and discovering optimal matching subsequences within the reference. The proposed algorithm applies some modifi- cations to S-DTW that make it better suited for the QbE-STD task, including a way to perform the matching with virtually no system memory, optimal when querying large scale databases. We also describe the system used to perform QbE-STD, including an energy-based quantification for speech/non-speech detection and an overlap detector for matches. We test the system proposed using the Mediaeval 2012 spoken-web-search dataset and show that, in addition to the memory savings, the proposed algorithm brings an advantage in terms of matching accuracy (up to 0.235 absolute MTWV increase) and speed (around 25% faster) in comparison to the original S-DTW. 

Download here

Games of Friends: a game-theoretical approach for link prediction in online social networks

Giovanni Zappella, Alexandros Karatzoglou, Linas Baltrunas

Games of Friends: a game-theoretical approach for link prediction in online social networks

Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence, June 2013

 

Abstract

Online Social Networks (OSN) have enriched the social lives of millions of users. Discovering new friends in the social network is valuable both for the user and for the health of OSN since users with more friends engage longer and more often with the site. The simplest way to formalize friend- ship recommendation is to cast the problem as a link prediction problem in the social graph. In this work we introduce a game-theoretical approach based on the Graph Transduction Game. It scales with ease beyond 13 million of users and was tested on a real world data from Tuenti OSN. We utilize the social graph and several other graphs that naturally arise in Tuenti such as the wall-to-wall post graph. We compare our approach to standard local measures and demonstrate a significant performance benefit in terms of mean average precision and reciprocal rank. 

Download here

ITMgen—A first-principles approach to generating synthetic interdomain traffic matrices

Jakub Mikians, Nikolaos Laoutaris, Amogh Dhamdhere, Pere Barlet-Ros

ITMgen—A first-principles approach to generating synthetic interdomain traffic matrices

Communications (ICC), 2013 IEEE International Conference on, June 2013

Download here

Exploiting foursquare and cellular data to infer user activity in urban environments

Anastasios Noulas, Cecilia Mascolo, Enrique Frias-Martinez

Exploiting foursquare and cellular data to infer user activity in urban environments

Mobile Data Management (MDM), 2013 IEEE 14th International Conference on, June 2013

 

Abstract

Inferring the type of activities in neighborhoods of urban centers may be helpful in a number of contexts including urban planning, content delivery and activity recommendations for mobile web users or may even yield to a deeper understanding of the geographical evolution of social life in the city . During the past few years, the analysis of mobile phone usage patterns, or of social media with longitudinal attributes, have aided the automatic characterization of the dynamics of the urban environment.

In this work, we combine a dataset sourced from a telecom- munication provider in Spain with a database of millions of geo- tagged venues from Foursquare and we formulate the problem of urban activity inference in a supervised learning framework. In particular, we exploit user communication patterns observed at the base station level in order to predict the activity of Foursquare users who checkin-in at nearby venues. First, we mine a set of machine learning features that allow us to encode the input telecommunication signal of a tower. Subsequently, we evaluate a diverse set of supervised learning algorithms using labels extracted from Foursquare place categories and we consider two application scenarios. Initially, we assess how hard it is to predict specific urban activity of an area, showing that Nightlife and Entertainment spots are those easier to infer, whereas College and Shopping areas are those featuring the lowest accuracy rates. Then, considering a candidate set of activity types in a geographic area, we aim to elect the most prominent one. We demonstrate how the difficulty of the problem increases with the number of classes incorporated in the prediction task, yet the classifiers achieve a considerably better performance compared to a random guess even when the set of candidate classes increases. 

 

Dowload here

ACORN: An auto-configuration framework for 802.11 n WLANs

Mustafa Y Arslan, Konstantinos Pelechrinis, Ioannis Broustis, Shailendra Singh, Srikanth V Krishnamurthy, Sateesh Addepalli, Konstantina Papagiannaki

ACORN: An auto-configuration framework for 802.11 n WLANs

Networking, IEEE/ACM Transactions on, June 2013
 
Abstract

The wide channels feature combines two adjacent channels to form a new, wider channel to facilitate high data rate transmissions in MIMO-based 802.11n networks. Using a wider channel can exacerbate interference effects. Furthermore, contrary to what has been reported by prior studies, we find that wide channels do not always provide benefits in isolation (i.e., one link without interference) and can even degrade performance. We conduct an in-depth, experimental study to understand the implications of wide channels on throughput performance. Based on our measurements, we design an auto-configuration framework called ACORN for enterprise 802.11n WLANs. ACORN integrates the functions of user association and channel allocation, since our study reveals that they are tightly coupled when wide channels are used. We show that the channel allocation problem with the constraints of wide channels is NP-complete. Thus, ACORN uses an algorithm that provides a worst case approximation ratio of O(1/∆+1) with ∆ being the maximum node degree in the network. We implement ACORN on our 802.11n testbed. Our evaluations show that ACORN (i) outperforms previous approaches that are agnostic to wide channels constraints; it provides per-AP throughput gains ranging from 1.5x to 6x and (ii) in practice, its channel allocation module achieves an approximation ratio much better than the theoretically predicted O(1/∆+1).

Dowload here

Perceptually inspired features for speaker likability classification

Sira Gonzalez, Xavier Anguera

Perceptually inspired features for speaker likability classification

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, May 2013

 

Abstract

We present a novel approach to speaker likability classifi- cation. Our algorithm, instead of extracting a large number of features, identifies a small set of features which rep- resent perceptual speech characteristics. For classification, linear support vector machines are used. We train and evaluate the performance on the Interspeech speaker trait challenge database and we show that our likability classifier outperforms the baseline classifier developed for the challenge while considerably reducing the number of features needed. 

Dowload here

Speed improvements to information retrieval-based dynamic time warping using hierarchical k-means clustering

Gautam Mantena, Xavier Anguera

Speed improvements to information retrieval-based dynamic time warping using hierarchical k-means clustering

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, June 2013

 

Abstract

With the increase in multi-media data over the Internet, query by example spoken term detection (QbE-STD) has become important in providing a search mechanism to find spoken queries in spoken au- dio. Audio search algorithms should be efficient in terms of speed and memory to handle large audio files. In general, approaches de- rived from the well known dynamic time warping (DTW) algorithm suffer from scalability problems.

To overcome such problems, an Information Retrieval-based DTW (IR-DTW) algorithm has been proposed recently. IR-DTW borrows techniques from Information Retrieval community to detect regions which are more likely to contain the spoken query and then uses a standard DTW to obtain exact start and end times. One drawback of the IR-DTW is the time taken for the retrieval of similar reference points for a given query point. In this paper we propose a method to improve the search performance of IR-DTW algorithm using a clustering based technique. The proposed method has shown an estimated speedup of 2400X. 

Dowload here

The spoken web search task at MediaEval 2012

Florian Metze, Xavier Anguera, Etienne Barnard, Marelie Davel, Guillaume Gravier

The spoken web search task at MediaEval 2012

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, May 2013

 

Abstract

In this paper, we describe the “Spoken Web Search” Task, which was held as part of the 2011 MediaEval benchmark campaign. The purpose of this task was to perform audio search with audio input in four languages, with very few resources being available in each language. The data was taken from “spoken web” material collected over mobile phone connections by IBM India. We present results from several independent systems, developed by five teams and us- ing different approaches, compare them, and provide analysis and directions for future research. 

Dowload here

Tower of babel: a crowdsourcing game building sentiment lexicons for resource-scarce languages

Yoonsung Hong, Haewoon Kwak, Youngmin Baek, Sue Moon

Tower of babel: a crowdsourcing game building sentiment lexicons for resource-scarce languages

Proceedings of the 22nd international conference on World Wide Web companion, May 2013

Download here

Your browsing behavior for a big mac: Economics of personal information online

Juan Pablo Carrascal, Christopher Riederer, Vijay Erramilli, Mauro Cherubini, Rodrigo de Oliveira

Your browsing behavior for a big mac: Economics of personal information online

Proceedings of the 22nd international conference on World Wide Web, May 2013

Dowload here

Size matters (spacing not): 18 points for a dyslexic-friendly Wikipedia

Luz Rello, Martin Pielot, Mari-Carmen Marcos, Roberto Carlini

Size matters (spacing not): 18 points for a dyslexic-friendly Wikipedia

ACM, Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility, May 2013

 

Abstract

In 2012, Wikipedia was the sixth-most visited website on the Internet. Being one of the main repositories of knowledge, students from all over the world consult it. But, around 10% of these students have dyslexia, which impairs their access to text-based websites. How could Wikipedia be presented to be more readable for this target group? In an experi­ ment with 28 participants with dyslexia, we compare read­ ing speed, comprehension, and subjective readability for the font sizes 10, 12, 14, 18, 22, and 26 points, and line spac­ings 0.8, 1.0, 1.4, and 1.8. The results show that font size has a significant effect on the readability and the under­ standability of the text, while line spacing does not. On the basis of our results, we recommend using 18-point font size when designing web text for readers with dyslexia. Our results significantly differ from previous recommendations, presumably, because this is the first work to cover a wide range of values and to study them in the context of an actual website. 

Dowload here

Optimizing TCP Performance in Multi-AP Residential Broadband Connections via Minislot Access

Domenico Giustiniano, Eduard Goma, Alberto Lopez Toledo, George Athanasiou

Optimizing TCP Performance in Multi-AP Residential Broadband Connections via Minislot Access

HINDAWI, Journal of Computer Networks and Communications, April 2013

Download here

METIS: Exploring mobile phone sensing offloading for efficiently supporting social sensing applications

Kiran K Rachuri, Christos Efstratiou, Ilias Leontiadis, Cecilia Mascolo, Peter J Rentfrow

METIS: Exploring mobile phone sensing offloading for efficiently supporting social sensing applications

Pervasive Computing and Communications (PerCom), 2013 IEEE International Conference on, March 2013

Download here

Delay-Tolerant Bulk Data Transfers on the Internet

Nikolaos Laoutaris, Georgios Smaragdakis, Rade Stanojevic, Pablo Rodriguez, Ravi Sundaram

Delay-Tolerant Bulk Data Transfers on the Internet

IEEE Networking, IEEE/ACM Transactions on  (Volume:21 ,  Issue: 6 ) March 2013

Download here

Cell Phone Analytics: Scaling Human Behavior Studies into the Millions.

Vanessa Frias-Martinez, Jesus Virseda

Cell Phone Analytics: Scaling Human Behavior Studies into the Millions.

Information Technologies & International Development, March 2013

Download here

App stores: external validity for mobile hci

Niels Henze, Martin Pielot

App stores: external validity for mobile hci

ACM, Interactios, March 2013

Doanload here

Experimental evaluation of context-dependent collaborative filtering using item splitting

Linas Baltrunas, Francesco Ricci

Experimental evaluation of context-dependent collaborative filtering using item splitting

User Modeling and User-Adapted Interaction, February 2013

 

Abstract

Collaborative Filtering (CF) computes recommendations by leveraging a historical data set of users’ ratings for items. CF assumes that the users’ recorded ratings can help in predicting their future ratings. This has been validated extensively, but in some domains the user’s ratings can be influenced by contextual conditions, such as the time, or the goal of the item consumption. This type of contextual infor- mation is not exploited by standard CF models. This paper introduces and analyzes a novel technique for context-aware CF called Item Splitting. In this approach items experienced in two alternative contextual conditions are “split” into two items. This means that the ratings of a split item, e.g., a place to visit, are assigned (split) to two new fictitious items representing for instance the place in summer and the same place in winter. This split is performed only if there is statistical evidence that under these two contextual conditions the items ratings are different; for instance, a place may be rated higher in summer than in winter. These two new fictitious items are then used, together with the unaffected items, in the rating prediction algorithm. When the system must predict the rating for that “split” item in a particular contextual condition (e.g., in summer), it will consider the new fictitious item representing the original one in that particular contextual condition, and will predict its rating. We evaluated this approach on real world, and semi-synthetic data sets using matrix factorization, and nearest neighbor CF algorithms. We show that Item Splitting can be beneficial and its performance depends on the method used to determine which items to split. We also show that the benefit of the method is determined by the relevance of the contextual factors that are used to split. 

Download here

Adaptive non-parametric identification of dense areas using cell phone records for urban analysis

Alberto Rubio, Angel Sanchez, Enrique Frias-Martinez

Adaptive non-parametric identification of dense areas using cell phone records for urban analysis

Pergamon, Engineering Applications of Artificial Intelligence, January 2013

 

Abstract

Pervasive large-scale infrastructures (like GPS, WLAN networks or cell-phone networks) generate large datasets containing human behavior information. One of the applications that can benefit from this data is the study of urban environments. In this context, one of the main problems is the detection of dense areas, i.e., areas with a high density of individuals within a specific geographical region and time period. Nevertheless, the techniques used so far face an important limitation: the definition of dense area is not adaptive and as a result the areas identified are related to a threshold applied over the density of individuals, which usually implies that dense areas are mainly identified in downtowns. In this paper, we propose a novel technique, called AdaptiveDAD, to detect dense areas that adaptively define the concept of density using the infrastructure provided by a cell phone network. We evaluate and validate our approach with a real dataset containing the Call Detail Records (CDR) of fifteen million individuals. 

Download here

3GOL: Power-boosting ADSL using 3G OnLoading

Claudio Rossi, Narseo Vallina-Rodriguez, Vijay Erramilli, Yan Grunenberger, Lazlo Gyarmati, Nikolaos Laoutaris, Rade Stanojevic, Dina Papagiannaki, Pablo Rodriguez
3GOL: Power-boosting ADSL using 3G OnLoading
In ACM CoNEXT 2013, Santa Barbara, CA, December 2013 
 
Abstract

The co-existence of cellular and wired networks has been exploited almost exclusively in the direction of OffLoading traffic from the former onto the latter. In this paper we claim that there exist cases that call for the exact opposite, i.e., use the cellular network to assist a fixed wired network. In particular, we show that by “OnLoading” traffic from the wired broadband network onto the cellular network we can usefully speedup wired connections, on the downlink or the uplink. We consider the technological challenges pertaining to this idea and implement a prototype 3G OnLoading service that we call 3GOL, that can be deployed by an operator providing both the wired and cellular network services. By strategically OnLoading a fraction of the data transfers to the 3G network, one can significantly enhance the performance of particular applications. In particular we demonstrate non-trivial performance benefits of 3GOL to two widely used applications: video-on-demand and multimedia upload. We also consider the case when the operator that provides wired and cellular services is different, adding the analysis on economic constraints and volume cap on cellular data plans that need to be respected. Simulating 3GOL over a DSLAM trace we show that 3GOL can reduce video pre-buffering time by at least 20% for 50% of the users while respecting data caps and we design a simple estimator to compute the daily allowance that can be used towards 3GOL while respecting caps. Our prototype is currently being piloted in 30 households in a large European city by a large network provider. 

 

Forecasting socioeconomic trends with cell phone records

Vanessa Frias-Martinez, Cristina Soguero-Ruiz, Enrique Frias-Martinez, Malvina Josephidou

Forecasting socioeconomic trends with cell phone records

Proceedings of the 3rd ACM Symposium on Computing for Development, January 2013

 

Abstract

National Statistical Institutes typically hire large numbers of enumerators to carry out periodic surveys regarding the socioeconomic status of a society. Such approach suffers from two drawbacks:(i) the survey process is expensive, especially for emerging countries that struggle with their budgets and (ii) the socioeconomic indicators are computed ex-post i.e., after socioeconomic changes have already happened. We propose the use of human behavioral patterns computed from calling records to predict future values of socioeconomic indicators. Our objective is to help institutions be able to forecast socioeconomic changes before they happen while reducing the number of surveys they need to compute. For that purpose, we explore a battery of different predictive approaches for time series and show that multivariate time-series models yield R-square values of up to 0.65 for certain socioeconomic indicators. 

Download here

Applications of temporal graph metrics to real-world networks

John Tang, Ilias Leontiadis, Salvatore Scellato, Vincenzo Nicosia, Cecilia Mascolo, Mirco Musolesi, Vito Latora

Applications of temporal graph metrics to real-world networks

Springer Berlin Heidelberg, Temporal Networks, January 2013

Doanload here

Event Detection in Communication and Transportation Data

Joachim Neumann, Manqi Zao, Alexandros Karatzoglou, Nuria Oliver

Event Detection in Communication and Transportation Data

Springer Berlin Heidelberg, Pattern Recognition and Image Analysis, January 2013
 

Download here

Evaluating temporal robustness of mobile networks

Salvatore Scellato, Ilias Leontiadis, Cecilia Mascolo, Prithwish Basu, Murtaza Zafer

Evaluating temporal robustness of mobile networks

Mobile Computing, IEEE Transactions on, January 2013

Download here

On weather and internet traffic demand

Juan Camilo Cardona, Rade Stanojevic, Rubén Cuevas

On weather and internet traffic demand

Springer Berlin Heidelberg, Passive and Active Measurement, January 2013

 

Abstract

The weather is known to have a major impact on demand of utilities such as electricity or gas. Given that the Internet usage is strongly tied with human activity, one could guess the existence of similar correlation between its traffic demand and weather conditions. In this paper, we empirically quantify such effects. We find that the influence of precipitation depends on both time of the day as well as time of the year, and is maximal in the late afternoon over summer months. 

Doanload here

 

Socially Enabled Preference Learning from Implicit Feedback Data

Julien Delporte, Alexandros Karatzoglou, Tomasz Matuszczyk, Stéphane Canu

Socially Enabled Preference Learning from Implicit Feedback Data

Springer Berlin Heidelberg, Machine Learning and Knowledge Discovery in Databases, January 2013

 

Abstract

Intheageofinformationoverload,collaborativefilteringand recommender systems have become essential tools for content discovery. The advent of online social networks has added another approach to recommendation whereby the social network itself is used as a source for recommendations i.e. users are recommended items that are preferred by their friends.

In this paper we develop a new model-based recommendation method that merges collaborative and social approaches and utilizes implicit feedback and the social graph data. Employing factor models, we repre- sent each user profile as a mixture of his own and his friends’ profiles. This assumes and exploits “homophily” in the social network, a phenomenon that has been studied in the social sciences. We test our model on the Epinions data and on the Tuenti Places Recommendation data, a large-scale industry dataset, where it outperforms several state-of-the-art methods. 

Download here

Ambient timer–unobtrusively reminding users of upcoming tasks with ambient light

Heiko Müller, Anastasia Kazakova, Martin Pielot, Wilko Heuten, Susanne Boll

Ambient timer–unobtrusively reminding users of upcoming tasks with ambient light

Human-Computer Interaction–INTERACT 2013, January 2013

 

Abstract

Daily office work is often a mix of concentrated desktop work and scheduled meetings and appointments. However, constantly checking the clock and alarming popups interrupt the flow of creative work as they require the user's focused attention. We present Ambient Timer, an ambient light display designed to unobtrusively remind users of upcoming events. The light display - mounted around the monitor - is designed to slowly catch the user's attention and raise awareness for an upcoming event while not distracting her from the primary creative task such as writing a paper. Our experiment compared established reminder techniques such as checking the clock or using popups against Ambient Timer in two different designs. One of these designs produced a reminder in which the participants felt well informed on the progress of time and experienced a better "flow" of work than with traditional reminders. 

Download here

When assistance becomes dependence: characterizing the costs and inefficiencies of A-GPS

Narseo Vallina-Rodriguez, Jon Crowcroft, Alessandro Finamore, Yan Grunenberger, Konstantina Papagiannaki

When assistance becomes dependence: characterizing the costs and inefficiencies of A-GPS

ACM SIGMOBILE Mobile Computing and Communications Review (CCR), volume 17, issue 4, pages 3-14

 

Abstract

Location based services are a vital component of the mobile ecosystem. Among all the location technologies used behind the scenes, A-GPS (Assisted-GPS) is considered to be the most accurate. Unlike standalone GPS systems, A-GPS uses network support to speed up position fix. However, it can be a dangerous strategy due to varying cell conditions which may impair performance, sometimes potentially neglecting the expected benefits of the original design. We present the characterization of the accuracy, location acquisition speed, energy cost, and network dependency of the state of the art A-GPS receivers shipped in popular mobile devices. Our analysis is based on active measurements, an exhaustive on-device analysis, and cellular traffic traces processing. The results reveals a number of inefficiencies as a result of the strong dependence on the cellular network to obtain assisting data, implementation, and integration problems. 

Download Here

Inter-call mobility model: a spatio-temporal refinement of call data records using a gaussian mixture model

With global mobile phone penetration nearing 100%, cellular Call Data Records (CDRs) provide a large-scale and ubiquitous, but also sparse and skewed snapshot of human mobility. It may be difficult or inappropriate to reach strong conclusions about user movement based on such data without proper understanding of user movement between call records. Based on an analysis of a real-world trace, we propose a novel, probabilistic Inter-Call Mobility (ICM) model of users' position in between calls. The ICM model combines Gaussian mixtures to build a general, comprehensive spatio-temporal refinement of CDRs.We demonstrate that ICM model's application yields strikingly different conclusions to the existing models when applied to basic CDR analyses, such as user proximity probability.

YouTube Everywhere: Impact of Device and Infrastructure Synergies on User Experience

Alessandro Finamore, Marco Mellia, Maurizio M. Munafò, Ruben Torres, Sanjay Rao

YouTube Everywhere: Impact of Device and Infrastructure Synergies on User Experience

ACM IMC - Internet Measurement Conference, Berlin, DE, ISBN: 978-1-4503-1013-0, November 2011

 

Abstract

 

In this paper we present a complete measurement study that compares YouTube traffic generated by mobile devices (smart-phones, tablets) with traffic generated by common PCs (desktops, notebooks, netbooks). We investigate the users’ behavior and correlate it with the system performance. Our measurements are performed using unique data sets which are collected from vantage points in nation-wide ISPs and University campuses from two countries in Europe and the U.S.

Our results show that the user access patterns are similar across a wide range of user locations, access technologies and user devices. Users stick with default player configurations, e.g., not changing video resolution or rarely enabling full screen playback. Furthermore it is very common that users abort video playback, with 60% of videos watched for no more than 20% of their duration.

We show that the YouTube system is highly optimized for PC access and leverages aggressive buffering policies to guarantee excellent video playback. This however causes 25%-39% of data to be unnecessarily transferred, since users abort the playback very early. This waste of data trans- ferred is even higher when mobile devices are considered. The limited storage offered by those devices makes the video download more complicated and overall less efficient, so that clients typically download more data than the actual video size. Overall, this result calls for better system optimization for both, PC and mobile accesses. 

Download Here

 

Spatial extension of the reality mining dataset

Data captured from a live cellular network with the real users during their common daily routine help to understand how the users move within the network. Unlike the simulations with limited potential or expensive experimental studies, the research in user-mobility or spatio-temporal user behavior can be conducted on publicly available datasets such as the Reality Mining Dataset. These data have been for many years a source of valuable information about social interconnection between users and user-network associations. However, an important, spatial dimension is missing in this dataset. In this paper, we present a methodology for retrieving geographical locations matching the GSM cell identifiers in the Reality Mining Dataset, an approach base on querying the Google Location API. A statistical analysis of the measure of success of locations retrieval is provided. Further, we present the LAC-clustering method for detecting and removing outliers, a heuristic extension of general agglomerative hierarchical clustering. This methodology enables further, previously impossible analysis of the Reality Mining Dataset, such as studying user mobility patterns, describing spatial trajectories and mining the spatio-temporal data.

Active GSM cell-id tracking: where did you disappear?

Location-based services are mobile network applications of growing importance and variability. The space of location technologies and applications has not yet been fully explored, perhaps omitting some important practical uses.

In this work we present the prototype SS7Tracker platform, an active, non-intrusive, GSM Cell-ID-based solution to network-based location tracking, and two novel applications of this technique: network diagnostics based on inroamer tracking and human activity research. We demonstrate the usability and performance limits of the platform on practical tests carried out in a live GSM network.

BLINC: Multilevel Traffic Classification in the Dark

Thomas Karagiannis, Kostantina Papagiannaki, Michalis Faloutsos.

BLINC: Multilevel Traffic Classi- fication in the Dark.

In ACM Sigcomm, Philadelphia, PA, August, 2005. 

 

Abstract

We present a fundamentally different approach to classifying traffic flows according to the applications that generate them. In contrast to previous methods, our approach is based on observing and identifying patterns of host behavior at the transport layer. We analyze these patterns at three levels of increasing detail (i) the social, (ii) the functional and (iii) the application level. This multilevel approach of looking at traffic flow is probably the most important contribution of this paper. Furthermore, our approach has two important features. First, it operates in the dark, having (a) no access to packet payload, (b) no knowledge of port numbers and (c) no additional information other than what current flow collectors provide. These restrictions respect privacy, technological and practical constraints. Second, it can be tuned to balance the accuracy of the classification versus the number of successfully classified traffic flows. We demonstrate the effectiveness of our approach on three real traces. Our results show that we are able to classify 80%-90% of the traffic with more than 95% accuracy. 

Download Here

Research Digest

Esta revista es una colección de proyectos de investigación y tecnologías en los que se ha trabajado en los ultimos años.

Research Newspaper

El Research Newspaper es un recopilatorio de noticias del grupo de investigacion.

Publicaciones

En Telefónica I+D seguimos un modelo de innovación abierta y colaborativa con Universidades y otras instituciones de investigación. Para favorecer la diseminación de nuestro trabajo en común presentamos publicaciones, patentes y formas de trabajo para transferir el conocimiento sobre nuevas tecnologías. Muchas veces llevamos a cabo talleres, seminarios y conferencias por lo que mantente al tanto de nuestras últimas noticias.