KIT - KASTEL: Cryptography and Security Group - Teaching - Institute Seminar - Privacy-Preserving Collection and Analysis of User Data

Privacy-Preserving Collection and Analysis of User Data – Provably Secure and Practical

Name:
Privacy-Preserving Collection and Analysis of User Data – Provably Secure and Practical
Venue:
252 / BBB
Date:
2026-03-31
Speaker:
Markus Raiber
Time:
15:45
A large number of applications collect, store and analyze data about end users. Examples
include customer loyalty systems such as Payback, incentive systems offered by health
insurers, behavior-dependent car insurance tariffs, pay-as-you-go public transport services,
smart metering, and many more besides. Currently, most of these systems simply collect
raw user data, resulting in vast datasets of personal information. However, this has
disadvantages for both users and companies. The collected data allows extensive profiles
to be created that go far beyond the intended use. Large collections of data are also a
lucrative target for cyber-attacks, which can harm affected users through identity theft,
for example, as well as causing harm to the involved company through negative publicity
and potential data protection penalties.
Privacy-preserving technologies are a remedy for these problems, as they allow the desired
analytics to be evaluated securely on relevant user data without the need to collect or
store sensitive user data in the clear. However, this comes with new challenges, as privacy-
preserving technologies are more computationally and communicationally complex. In this
thesis, we propose and evaluate two generic solutions based on these technologies. Both
solutions are formally modelled in the Universal Composability framework, which allows
them to be used in any context while maintaining strong security guarantees through
simulation-based security. Furthermore, both solutions come with a practical prototype
implementation and evaluation, showcasing the potential for practical deployment as well
as the current limitations. In both solutions, we ensure that users remain anonymous
when data is collected, while guaranteeing the authenticity of the collected data.
Our first solution, called PUBA, is based on personal logbooks stored on each user’s device.
These logbooks are authenticated and can only be updated by the system operator while
maintaining the confidentiality of their content. Users can then participate in privacy-
preserving analytics computation, where it is ensured that their logbook is up-to-date
and authentic. To accommodate constrained user devices, such as smartphones, users can
outsource more complex analytics computations to a (potentially malicious) proxy that is
not colluding with the system operator. Performance evaluations of our prototype show
that PUBA has sufficient performance for logbooks storing the last 10-30 transactions.
In our second solution, called POBA, the logbooks are stored on operator-controlled servers
instead. We model a setting in which multiple operators collaborate to run the system
without fully trusting each other. Logbook contents are protected by secret-sharing them
between all the operators involved. Additionally, advanced cryptographic tools, such
as oblivious RAM, are employed to protect user identities and prevent the linking of
multiple interactions to the same user. Since data is available without user interaction
in this setting, operators have more flexibility when running analytics. As long as some
operators behave honestly, requiring all operators to agree to computations still ensures
that the analysis results satisfy privacy requirements. Performance evaluations of our
prototype demonstrate its practicability in the three-party setting: With three operators,
it can handle over two million logbook entries per day.