Talks | Sina Shaham

VLDB 2023

Model and Mechanisms for Spatial Data Fairness

Publication Link: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=WnWN4NkAAAAJ&sortby=pubdate&citation_for_view=WnWN4NkAAAAJ:QIV2ME_5wuYC

Abstract

Fairness in data-driven decision-making studies scenarios where individuals from certain population segments may be unfairly treated when being considered for loan or job applications, access to public resources, or other types of services. In location-based applications, decisions are based on individual whereabouts, which often correlate with sensitive attributes such as race, income, and education. While fairness has received significant attention recently, e.g., in machine learning, there is little focus on achieving fairness when dealing with location data. Due to their characteristics and specific type of processing algorithms, location data pose important fairness challenges. We introduce the concept of spatial data fairness to address the specific challenges of location data and spatial queries. We devise a novel building block to achieve fairness in the form of fair polynomials. Next, we propose two mechanisms based on fair polynomials that achieve individual spatial fairness, corresponding to two common location-based decision-making types: distance-based and zone-based. Extensive experimental results on real data show that the proposed mechanisms achieve spatial fairness without sacrificing utility.

Category: Talks

EDBT 2022

Differentially-Private Publication of Origin-Destination Matrices with Intermediate Stops

Publication Link:

https://scholar.google.com.au/citations?view_op=view_citation&hl=en&user=WnWN4NkAAAAJ&sortby=pubdate&citation_for_view=WnWN4NkAAAAJ:M3ejUd6NZC8C

Abstract

Conventional origin-destination (OD) matrices record the count of trips between pairs of start and end locations, and have been extensively used in transportation, traffic planning, etc. More recently, due to use case scenarios such as COVID-19 pandemic spread modeling, it is increasingly important to also record intermediate points along an individual’s path, rather than only the trip start and end points. This can be achieved by using a multi-dimensional frequency matrix over a data space partitioning at the desired level of granularity. However, serious privacy constraints occur when releasing OD matrix data, and especially when adding multiple intermediate points, which makes individual trajectories more distinguishable to an attacker. To address this threat, we propose a technique for privacy-preserving publication of multi-dimensional OD matrices that achieves differential privacy (DP), the de-facto standard in private data release. We propose a family of approaches that factor in important data properties such as data density and homogeneity in order to build OD matrices that provide provable protection guarantees while preserving query accuracy. Extensive experiments on real and synthetic datasets show that the proposed approaches clearly outperform existing state-of-the-art.

Category: Talks

ACM SIGSPATIAL 2021

HTF: Homogeneous Tree Framework for Differentially-Private Release of Location Data

Publication Link:

https://scholar.google.com.au/citations?view_op=view_citation&hl=en&user=WnWN4NkAAAAJ&sortby=pubdate&citation_for_view=WnWN4NkAAAAJ:_kc_bZDykSQC

Abstract

Mobile apps that use location data are pervasive, spanning domains such as transportation, urban planning and healthcare. Important use cases for location data rely on statistical queries, e.g., identifying hotspots where users work and travel. Such queries can be answered efficiently by building histograms. However, precise histograms can expose sensitive details about individual users. Differential privacy (DP) is a mature and widely-adopted protection model, but most approaches for DP-compliant histograms work in a data-independent fashion, leading to poor accuracy. The few proposed data-dependent techniques attempt to adjust histogram partitions based on dataset characteristics, but they do not perform well due to the addition of noise required to achieve DP. We identify density homogeneity as a main factor driving the accuracy of DP-compliant histograms, and we build a data structure that splits the space such that data density is homogeneous within each resulting partition. We show through extensive experiments on large scale real-world data that the proposed approach achieves superior accuracy compared to existing approaches.

Category: Talks

EDBT 2021

Title: An Efficient and Secure Location-based Alert Protocol using Searchable Encryption and Huffman Codes

Publication Link:

https://arxiv.org/abs/2105.00618

Abstract:

Location data are widely used in mobile apps, ranging from location-based recommendations, to social media and navigation. A specific type of interaction is that of location-based alerts, where mobile users subscribe to a service provider (SP) in order to be notified when a certain event occurs nearby. Consider, for instance, the ongoing COVID-19 pandemic, where contact tracing has been singled out as an effective means to control the virus spread. Users wish to be notified if they came in proximity to an infected individual. However, serious privacy concerns arise if the users share their location history with the SP in plaintext. To address privacy, recent work proposed several protocols that can securely implement location-based alerts. The users upload their encrypted locations to the SP, and the evaluation of location predicates is done directly on ciphertexts. When a certain individual is reported as infected, all matching ciphertexts are found (e.g., according to a predicate such as “10 feet proximity to any of the locations visited by the infected patient in the last week”), and the corresponding users notified. However, there are significant performance issues associated with existing protocols. The underlying searchable encryption primitives required to perform the matching on ciphertexts are expensive, and without a proper encoding of locations and search predicates, the performance can degrade a lot. In this paper, we propose a novel method for variable-length location encoding based on Huffman codes. By controlling the length required to represent encrypted locations and the corresponding matching predicates, we are able to significantly speed up performance. We provide a theoretical analysis of the gain achieved by using Huffman codes, and we show through extensive experiments that the improvement compared with fixed-length encoding methods is substantial.

Category: Talks

DBSec 2020

Title: Enhancing the Performance of Spatial Queries on Encrypted Data Through Graph Embedding

Presented in IFIP Annual Conference on Data and Applications Security and Privacy

Publication Link: https://link.springer.com/chapter/10.1007/978-3-030-49669-2_17

Abstract: Most online mobile services make use of location data to improve customer experience. Mobile users can locate points of interest near them, or can receive recommendations tailored to their whereabouts. However, serious privacy concerns arise when location data is revealed in clear to service providers. Several solutions employ Searchable Encryption (SE) to evaluate spatial predicates directly on location ciphertexts. While doing so preserves privacy, the performance overhead incurred is high. We focus on a prominent SE technique in the public-key setting – Hidden Vector Encryption (HVE), and propose a graph embedding technique to encode location data in a way that significantly boosts the performance of processing on ciphertexts. We show that finding the optimal encoding is NP-hard, and provide several heuristics that are fast and obtain significant performance gains. Our extensive experimental evaluation shows that our solutions can improve computational overhead by a factor of two compared to the baseline.

Category: Talks

ICC Conference 2020

Title: Extended kalman filter beam tracking for millimeter wave vehicular communications

Publication Link:

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9145366

Abstract:

Millimeter-wave (mmWave) communication is a promising technology to meet the ever-growing data traffic of vehicular communications. Unfortunately, more frequent channel estimations are required in this spectrum due to the narrow beams employed to compensate for the high path loss. Hence, the development of highly efficient beam tracking algorithms is essential to enable the technology, particularly for fast-changing environments in vehicular communications. In this paper, we propose an innovative scheme for beam tracking based on the Extended Kalman Filter (EKF), improving the mean square error performance by 49% in vehicular settings. We propose to use the position, velocity, and channel coefficient as state variables of the EKF algorithm and show that such an approach results in improved beam tracking with low computational complexity by taking the kinematic characteristics of the system into account. We also explicitly derive the closed-from expressions for the Jacobian matrix of the EKF algorithm.

Category: Talks

Database Design and Development

Last lecture of the course summarizing topics such as ER diagrams, DB design, DB implementation, Redesign, and BI.

Category: Talks