Comparer le texte

Encuentra la diferencia entre dos archivos de texto

Real-time diff

Unified diff

Collapse lines

Highlight change

Syntax highlighting

Outils

Diffchecker Desktop The most secure way to run Diffchecker. Get the Diffchecker Desktop app: your diffs never leave your computer!Get Desktop

正文部分

Created 4 months agoDiff never expires

Lines
Total
Removed

Words
Total
Removed

To continue using this feature, upgrade to Diffchecker Pro View Pricing

182 lines

Lines
Total
Added

Words
Total
Added

To continue using this feature, upgrade to Diffchecker Pro View Pricing

\begin{abstract}

Graph data analysis, particularly local triangle counting, plays a pivotal role in deciphering complex relationships within graph data. This method is invaluable across diverse fields such as social networks, transportation, and cybersecurity. However, this process often involves handling sensitive information, necessitating that the relationship between any two nodes should be considered private.

Differential privacy (DP) is a formal model to address the privacy concern and can be categorized into two kinds: the central DP (CDP) model, which achieves better result accuracy, and the local DP (LDP) model, which does not assume a trusted server.

Differential privacy (DP) is a formal model to address this privacy concern and can be categorized into two types: the central DP (CDP) model, which achieves better result accuracy, and the local DP (LDP) model, which does not assume a trusted server.

To bridge the gap between the two models, we propose {\textsf{Sectric}}, a \underline{se}rver-aided \underline{c}rypto-assisted \chadded{local} \underline{tri}angle \underline{c}ounting protocol, in this paper.

It can achieve the same result accuracy with the same privacy budget as the CDP model without assuming a trusted server.

\textsf{Sectric} also explores a new approach in crypto-assisted graph data analysis algorithms that represents a node's neighbors using a set instead of \chreplaced{an}{a} adjacency vector, and successfully \chreplaced{achieves}{achieve} higher efficiency compared to other crypto-assisted solutions.

We also conduct theoretical and empirical evaluations to demonstrate that \textsf{Sectric} achieves the design principles.

\end{abstract}

\section{Introduction}

Graph data analysis is pivotal in unraveling complex relationships and patterns in graph data, making it a useful tool in various fields such as social networks~\cite{Backstrom2007WhereforeAT, Charbey2019StarsHO, Isaak2018UserDP, newman2009random}, transportation and logistics~\cite{Leskovec2007CosteffectiveOD, Pourhabibi2020FraudDA, Tan2019AGM, Weber2018ScalableGL}, and cybersecurity~\cite{Jernigan2009GaydarFF}.

In many applications, graph data \chreplaced{is}{are} stored in a decentralized manner~\cite{becchetti2008efficient}, where each node knows its neighbors, while no central node has the full graph topology. In this setting, \emph{local triangle counting}, which calculates the triangle counts containing a given node, is a fundamental graph analysis task. It is crucial for many downstream applications. For example, in decentralized social networks, the nodes will first perform a local triangle count to calculate the local clustering coefficient~\cite{newman2009random,becchetti2010efficient,green2014fast,li2017clustering} or local transitivity ratio~\cite{schank2005approximating,al2018triangle}, which reflects its importance in the network.

Prior differentially private solutions can be mainly categorized into two kinds: adapting the central DP (CDP) model or the local DP model (LDP).

Prior differentially private solutions can be mainly categorized into two types: adapting the central DP (CDP) model or the local DP model (LDP).

The CDP model assumes a trusted server to calculate the triangle counts and adds a small noise to the final result\footnote{In the CDP model, the server knows the complete and accurate topology of the graph, including all nodes and the connection edges to their neighbors.}. The LDP model eliminates the trust assumption on the server, but has to add more noise in the calculating process with the same privacy budget. Thus, the CDP model has better result accuracy and the LDP model aligns better with the decentralized setting.

The CDP model assumes a trusted server to calculate the triangle counts and adds a small noise to the final result\footnote{In the CDP model, the server knows the complete and accurate topology of the graph, including all nodes and the connection edges to their neighbors.}. The LDP model eliminates the trust assumption on the server, but has to add more noise in the calculating process with the same privacy budget. Thus, the CDP model has better result accuracy, and the LDP model aligns better with the decentralized setting.

To bridge the gap between the two models, we notice the emerging crypto-assisted approach in other graph analysis tasks and explore its application in privacy-preserving local triangle counting. A mainstream framework for adapting this approach is to make graph nodes encrypt their \chreplaced{adjacency}{adjacent} relationships and \chreplaced{send only}{only send} the ciphertexts to the assisting servers. Then, assisting servers interact with graph nodes using cryptographic primitives to finally obtain the analysis result. Thus, this approach does not require trusted assisting servers and provides better privacy protection than the LDP model\footnote{In the LDP model, graph nodes send their adjacent vector with other nodes with bounded noise to the server in the calculating process.}. The \chreplaced{servers}{server} can also have an accurate analysis result due to the use of cryptographic tools, which guarantees utility.

To reduce the overheads, we explore \chreplaced{the usage of}{to use} adjacency sets, instead of adjacency vectors, to represent \chreplaced{node adjacencies}{node's neighbors}. To implement this idea, we design a novel cryptographic tool named \emph{Three-Party Private Set Membership Test (3PPSMT)}, rather than using secret sharing as CARGO and MAGO. This 3PPSMT primitive reduces the overheads of counting two graph nodes' common neighbors from $O(|\vdv|)$ to $O(d_{max})$ \chreplaced{compared}{comparing} to secret sharing. With this primitive, \textsf{Sectric} \chreplaced{introduces only}{only brings} $O(d_{max}|\vdv|)$ overheads in privacy-preserving local triangle counting. It makes \textsf{Sectric} more suitable for analyzing sparse graphs (i.e., $d_{max} = o(|\vdv|)$).

Meanwhile, \textsf{Sectric} enables the querier to calculate the accurate counting result. It guarantees utility but may also reveal graph nodes' \chreplaced{adjacency}{adjacent} relationship. To prevent this privacy leakage in the \chreplaced{computation}{calculating} result, we also demonstrate that \textsf{Sectric} is also compatible with DP mechanism. \textsf{Sectric} allows to add noise subject to a given distribution, ensuring that the querier receives only the noisy result, while the server gains no information. The noise intensity is \chreplaced{the same as}{same to} that in the CDP model given the same privacy budget. In a nutshell, \textsf{Sectric} has the same result utility when providing the same privacy guarantee as the CDP model, and meanwhile requires no trusted server as the LDP model.

Meanwhile, \textsf{Sectric} enables the querier to calculate the accurate counting result. It guarantees utility but may also reveal graph nodes' \chreplaced{adjacency}{adjacent} relationship. To prevent this privacy leakage in the \chreplaced{computation}{calculating} result, we also demonstrate that \textsf{Sectric} is compatible with the DP mechanism. \textsf{Sectric} allows adding noise subject to a given distribution, ensuring that the querier receives only the noisy result, while the server gains no information. The noise intensity is \chreplaced{the same as}{same to} that in the CDP model given the same privacy budget. In a nutshell, \textsf{Sectric} has the same result utility when providing the same privacy guarantee as the CDP model, and meanwhile requires no trusted server as the LDP model.

Furthermore, we also fully utilize the local view of graph nodes to reduce the number of required servers. Prior crypto-assisted graph data analysis solutions require two or more non-collusive servers to assist in, while \textsf{Sectric} only requires one assisting server. This requirement is easier to implement in practical applications.

Furthermore, we also fully utilize the local view of graph nodes to reduce the number of required servers. Prior crypto-assisted graph data analysis solutions require two or more non-collusive servers to assist, while \textsf{Sectric} only requires one assisting server. This requirement is easier to implement in practical applications.

The main contributions of this work can be summarized as:

\begin{itemize}

\item We design a novel local triangle counting protocol \textsf{Sectric} bridging the trust assumption and utility gap between the CDP and LDP models in the problem. Our solution shows that the same privacy and utility guarantee as the CDP can be achieved without requiring a trusted server.

\item We explore a new approach in crypto-assisted graph data analysis algorithms that represents a node's neighbors using a set instead of an adjacency vector and reduces the overheads from $O(|\vdv|^2)$ to $O(d_{max}|\vdv|)$. We believe that this approach can also be adopted in other tasks to reduce the overheads and will further explore it in the future.

%\item We define a novel cryptographic primitive $\fdv_\textsf{3PPSMT}$ and propose the $\Pi_\textsf{3PPSMT}$ protocol, an efficient implementation of this primitive. This primitive can also be applied to other privacy-aware graph data analysis tasks to avoid intermediate privacy costs.

\item We perform a comprehensive theoretical and empirical analysis of \textsf{Sectric} to demonstrate its privacy guarantees and performance. We also adapt the open-source implementation of a state-of-the-art work~\cite{liu2024cargo,CARGO} to local triangle counting as the baseline.

\end{itemize}

\stitle{Paper Organization.}

The rest of this paper is organized as follows: In Section \ref{section: related work}, we discuss the related works. Then, we introduce the necessary preliminaries of this work in Section \ref{section: preliminaries}. In Section \ref{section: 3PPSMT}, we define the primitive $\fdv_\textsf{3PPSMT}$ and propose the $\Pi_\textsf{3PPSMT}$ protocol implementing this primitive. Based on this primitive, the construction of the \textsf{Sectric} protocol is presented in Section \ref{section: sectric construction}. Section \ref{section: implementation and evaluation} presents the experimental results and Section \ref{section: conclusion} concludes this paper.

The rest of this paper is organized as follows: In Section \ref{section: related work}, we discuss the related works. Then, we introduce the necessary preliminaries of this work in Section \ref{section: preliminaries}. In Section \ref{section: 3PPSMT}, we define the primitive $\fdv_\textsf{3PPSMT}$ and propose the $\Pi_\textsf{3PPSMT}$ protocol implementing this primitive. Based on this primitive, the construction of the \textsf{Sectric} protocol is presented in Section \ref{section: sectric construction}. Section \ref{section: implementation and evaluation} presents the experimental results, and Section \ref{section: conclusion} concludes this paper.

\section{Related Work}

\label{section: related work}

\subsection{Privacy-Preserving Triangle Counting}

The problem of privacy-preserving triangle counting has been an active area of research. Existing works can be broadly categorized into two groups based on whether they assume the existence of a trusted server or not: centralized model-based approaches and decentralized model-based approaches.

The centralized model assumes the existence of a trusted server which holds the entire graph. Ding et al.~\cite{Ding2021DifferentiallyPT} achieve a balance between the accuracy of triangle counting and data privacy by selecting appropriate edge deletion strategies. Raskhodnikova et al.~\cite{Karwa2011PrivateAO, Nissim2007SmoothSA} use randomized strategies to ensure that the published triangle counts do not accurately allow inferring the existence of any particular edge, while Kasiviswanathan et al.~\cite{Kasiviswanathan2013AnalyzingGW} have achieved this by projecting the graph with a limited degree threshold. However, these approaches require a fully trusted and central server, which can introduce privacy issues in many applications.

The centralized model assumes the existence of a trusted server that holds the entire graph. Ding et al.~\cite{Ding2021DifferentiallyPT} achieve a balance between the accuracy of triangle counting and data privacy by selecting appropriate edge deletion strategies. Raskhodnikova et al.~\cite{Karwa2011PrivateAO, Nissim2007SmoothSA} use randomized strategies to ensure that the published triangle counts do not accurately allow inferring the existence of any particular edge, while Kasiviswanathan et al.~\cite{Kasiviswanathan2013AnalyzingGW} have achieved this by projecting the graph with a limited degree threshold. However, these approaches require a fully trusted and central server, which can introduce privacy issues in many applications.

The solutions proposed by these works can provide privacy in the existence of a fully trusted central server. However, in the scenarios where a trusted server is intractable to be implemented, their solutions cannot be directly applied.

The solutions proposed by these works can provide privacy in the presence of a fully trusted central server. However, in scenarios where a trusted server is intractable to implement, their solutions cannot be directly applied.

Thus, many works have proposed a decentralized model, where the node set is still considered public knowledge, but the relationship between two nodes is only known to them and treated as their privacy. Sun et al.~\cite{Sun2019AnalyzingSS} \chreplaced{propose}{proposed} a local differential privacy approach, where users locally perturb their adjacency vectors to protect the privacy of edges. But their assumption that users have an extended local view, allowing them to see their 2-hop neighbors, introduces the data correlation problem~\cite{Liu2022CollectingTC}, and is not applicable in most real-world cases. In the more realistic scenario where users can only see their immediate neighbors, Imola et al.~\cite{Imola2021CommunicationEfficientTC, Imola2020LocallyDP} utilize multiple rounds of interactions to upload and download perturbed edge information. While this approach can preserve privacy, it also introduces a non-negligible additive error, as highlighted by~\cite{Eden2023TriangleCW}.

Thus, many works have proposed a decentralized model, where the node set is still considered public knowledge, but the relationship between two nodes is only known to them and treated as their privacy. Sun et al.~\cite{Sun2019AnalyzingSS} \chreplaced{propose}{proposed} a local differential privacy approach, where users locally perturb their adjacency vectors to protect the privacy of edges. However, their assumption that users have an extended local view, allowing them to see their 2-hop neighbors, introduces the data correlation problem~\cite{Liu2022CollectingTC}, and is not applicable in most real-world cases. In the more realistic scenario where users can only see their immediate neighbors, Imola et al.~\cite{Imola2021CommunicationEfficientTC, Imola2020LocallyDP} utilize multiple rounds of interactions to upload and download perturbed edge information. While this approach can preserve privacy, it also introduces a non-negligible additive error, as highlighted by~\cite{Eden2023TriangleCW}.

In the decentralized model, crypto-assisted solutions to triangle counting are also emerging in recent years. CARGO~\cite{liu2024cargo} utilizes a hybrid approach that combines additive secret sharing and differential privacy, allowing two untrusted servers to only see encoded values beyond other information. This approach enables users to add smaller amounts of noise when implementing differential privacy, thereby achieving better utility compared to ~\cite{Imola2021CommunicationEfficientTC}. Building on a similar approach, Imola et al.~\cite{Imola2022DifferentiallyPT} introduce a trusted intermediate server with shuffling functionality. Another work, MAGO~\cite{wang2023mago}, which is based on lightweight secret sharing techniques, utilizes three servers from different trust domains working in coordination to improve the accuracy of triangle counting, and also detect whether malicious adversaries attempt to tamper with the statistical result.

In the decentralized model, crypto-assisted solutions to triangle counting are also emerging in recent years. CARGO~\cite{liu2024cargo} utilizes a hybrid approach that combines additive secret sharing and differential privacy, allowing two untrusted servers to only see encoded values beyond other information. This approach enables users to add smaller amounts of noise when implementing differential privacy, thereby achieving better utility compared to ~\cite{Imola2021CommunicationEfficientTC}. Building on a similar approach, Imola et al.~\cite{Imola2022DifferentiallyPT} introduce a trusted intermediate server with shuffling functionality. Another work, MAGO~\cite{wang2023mago}, which is based on lightweight secret sharing techniques, utilizes three servers from different trust domains working in coordination to improve the accuracy of triangle counting and also detect whether malicious adversaries attempt to tamper with the statistical result.

A summary of the comparison between \textsf{Sectric} and other related works on triangle counting are presented in Table \ref{tab: related works}.

A summary of the comparison between \textsf{Sectric} and other related works on triangle counting is presented in Table \ref{tab: related works}.

\subsection{Crypto-Assisted Graph Analytics}

Crypto-assisted solutions are also emerging in other graph analytics tasks.

Some works enable users to securely contribute their local views on a decentralized social graph for spectral analytics. Sharma et al.~\cite{Sharma2019PrivateGraphPS} utilize homomorphic encryption to protect the privacy of graph edges, allowing distributed data owners to interact with cloud-based programs while keeping their data private from the cloud service provider. PrivGED~\cite{Wang2022PrivacyPreservingAO} employs secret sharing to encrypt the elements in local view vectors, enabling privacy-preserving eigen-decomposition analytics over decentralized social graphs while safeguarding users' social relationships. However, these studies focus on different analytical tasks than our work on \chadded{local} triangle counting.

Another line of research leverages cryptographic techniques for privacy-preserving epidemiological analysis, such as analyzing transmission chains or clusters to predict infection rates using contact data stored on mobile devices. RIPPLE~\cite{Gunther2022PosterPE, Holz2020PEMPE} enables realistic simulations on the actual person-to-person social contact graph, utilizing a set of semi-honest non-colluding MPC servers to facilitate communication among participants. Colo~\cite{Liu2024MakingPF} introduces a protocol that guards against malicious device behavior using random masks, efficient commitments, and range proofs, ensuring that devices only learn their own node, edge, and topology data, while the analyst only learns the query result. However, these methods are not directly applicable to our \chadded{local} triangle counting problem.

There is also a research direction focusing on collaborative graph analytics, where each client possesses a local subgraph with multiple nodes and edges. Araki et al.~\cite{Araki2021SecureGA} propose a secure shuffling method for a 3-server setting with an honest majority, implementing algorithms like breadth-first search and maximal independent set. Guan et al.~\cite{Guan2023EfficientAP} design a scheme for two data owners to jointly respond to a subgraph matching query without disclosing their graph datasets to each other. FEAT~\cite{liu2024federated} has a central server that collects subgraph data from clients using private set union, aggregates them into a noisy global graph, and performs triangle counting. Oryx~\cite{ZhongOryxP} can detect cycles of various lengths on a multi-party federated graph while preserving topological secrecy. Pang et al.~\cite{PANG2025103952} design a scheme based on structured encryption and private set intersection cardinality techniques. They provide server tokens to queriers to query the butterfly counts of specific nodes or edges. However, the assumptions in these studies differ from our scenario, where users only have a local perspective.

\section{Preliminaries}

\label{section: preliminaries}

We first introduce some notations used in this paper. For a positive integer $N$, $[N]$ denotes the set $\setpresentation{1, 2, \dots, N}$\chdeleted{ for a given positive integer N}, and $\mathbb{Z}_N$ represents the group modulo $N$. Given a set $\xdv$, $x \xleftarrow{\$} \xdv$ indicates that $x$ is uniformly selected from $\xdv$. A real number ensemble $\setpresentation{p_\lambda}_{\lambda \in \mathbb{N}}$ is negligible if for any polynomial $p$, $p_\lambda \leq \frac{1}{p(\lambda)}$ for all sufficiently large $\lambda$.

We first introduce some notations used in this paper. For a positive integer $N$, $[N]$ denotes the set $\setpresentation{1, 2, \dots, N}$\chdeleted{ for a given positive integer N}, and $\mathbb{Z}_N$ represents the group modulo $N$. Given a set $\xdv$, $x \xleftarrow{\$} \xdv$ indicates that $x$ is uniformly selected from $\xdv$. A real number ensemble $\setpresentation{p_\lambda}_{\lambda \in \mathbb{N}}$ is negligible if, for any polynomial $p$, $p_\lambda \leq \frac{1}{p(\lambda)}$ for all sufficiently large $\lambda$.

\subsection{Problem Definition}

\label{subsection: problem definition}

\stitle{Local Triangle Counting.}

Let $\gdv = (\vdv, \edv)$ be a graph, where $\vdv$ and $\edv$ represent the set of nodes and edges, respectively. Two nodes $u, v \in \vdv$ are adjacent if $(u, v) \in \edv$. We consider undirected graphs, so for any two nodes $u, v \in \vdv$, $(u, v) \in \edv$ if and only if $(v, u) \in \edv$. Without loss of generality, we assume the nodes in $\vdv$ are indexed from $1$ to $|\vdv|$.

We define the notion of local triangle sets in Definition \ref{definition: local triangle set}. Intuitively speaking, the local triangle set $\Delta_u$ of a node $u$ is the set of all triangles containing $u$. With the notion of local triangle set, the {local triangle counting} problem can be stated as calculating $|\Delta_u|$ given a graph $\gdv = (\vdv, \edv)$ and a node $u \in \vdv$.

\begin{definition}[Local triangle set]

\label{definition: local triangle set}

Given a graph $\gdv = (\vdv, \edv)$, the local triangle set of a node $u \in \vdv$ is defined as:

\Delta_u = \setpresentation{\setpresentation{u, v, w} \subset \vdv: (u, v) , (u, w) , (v, w) \in \edv}.

\end{definition}

We also define the notion of a graph node's neighbor set in Definition \ref{definition: neighbor set}. The neighbor set $N_u$ of a node $u$ denotes the set of all nodes adjacent to $u$.

\begin{definition}[Neighbor set]

\label{definition: neighbor set}

Given a graph $\gdv = (\vdv, \edv)$, we define the neighbor set of a node $u \in \vdv$ as

N_u = \setpresentation{v \in \vdv: (u, v) \in \edv}.

\end{definition}

The following theorem establishes the relationship between the local triangle sets and the neighbor sets, which serves as the foundation for our \textsf{Sectric} protocol.

\begin{theorem}

\label{theorem: relation between neighbor and local triangle}

Given a graph $\gdv = (\vdv, \edv)$, for any node $u \in \vdv$, we have

|\Delta_u| = \frac{1}{2} \sum_{v \in N_u} |N_u \cap N_v|.

\end{theorem}

\begin{proof}\color{blue}

We know that the number of triangles containing node $u$ is given by:

|\Delta_u| = \sum_{v \in N_u} \sum_{w \in N_u, w > v} I(v, w),

where $I(v, w)$ is an indicator function that equals 1 if there is an edge between $v$ and $w$, and 0 otherwise.

Additionally, we have:

|N_u \cap N_v| = \sum_{w \in N_u} I(v, w).

Thus, we can express the sum as:

\sum_{v \in N_u} |N_u \cap N_v| = \sum_{v \in N_u} \sum_{w \in N_u} I(v, w).

In the calculation process, the positions of $v$ and $w$ are equivalent and interchangeable. Therefore, we can conclude that:

|\Delta_u| = \sum_{v \in N_u} \sum_{w \in N_u, w > v} I(v, w) = \frac{1}{2} \sum_{v \in N_u} |N_u \cap N_v|.

\end{proof}

\stitle{System Model.}

We design \textsf{Sectric} in a server-aided paradigm, where a server $\sdv$ is introduced. The \textsf{Sectric} system model is illustrated in Fig. \ref{figure: Sectric system example}. The system participants include the graph nodes and the server $\sdv$. The left side of the figure depicts the graph, where the solid lines represent the adjacency relations between nodes (i.e., the edges). The right side shows the server $\sdv$. The dashed lines between the graph nodes and $\sdv$ indicate the communication model, where every node has a communication channel with the server $\sdv$.

The \textsf{Sectric} protocol is designed in the context of decentralized graph data. Supposing the graph is represented as $\gdv = (\vdv, \edv)$, we model this decentralized setting such that the number of graph nodes $|\vdv|$ is public knowledge, and each graph node $u \in \vdv$ is only aware of its neighbor set $N_u$ (ref. Definition \ref{definition: neighbor set}). The server $\sdv$ has no knowledge about the graph's topology (i.e., the edges).

In \textsf{Sectric}, a graph node $Q$ acts as the querier and requests the number of its local triangles. During the protocol execution, the querier $Q$ interacts with the server $\sdv$ and its neighbors. Finally, the protocol outputs the number of $Q$'s local triangles to $Q$.

\stitle{Threat Model and Privacy Constraint.}

In this work, we consider a commonly adopted semi-honest model. This model assumes that the graph nodes and the server $\sdv$ will follow the \textsf{Sectric} protocol, but may attempt to extract additional knowledge about a given node's neighbor set from their view (ref. Definition \ref{definition: protocol participants' view}). In \textsf{Sectric}, we allow graph nodes to collude, but require that the server does not collude with any graph node.

\begin{definition}[Participants' View]

\label{definition: protocol participants' view}

Suppose a multiparty computation protocol $\Pi$ is executed by a set of participants $P_1$, \dots, $P_n$. The view of $P_i$ in the execution is defined as $$VIEW_i = \setpresentation{r_i, \mdv_i},$$

where $r_i$ is the randomness used by $P_i$ in the execution and $\mdv_i$ is all the messages it receives.

\end{definition}

The privacy constraint considered by \textsf{Sectric} is to protect the privacy of graph nodes in the local triangle counting task, which mainly refers to their adjacency relations in the decentralized graph setting. In other words, a graph node $u$'s private information is its neighbor set $N_u$, which should not be revealed to other graph nodes or the server $\sdv$. We refer to this privacy requirement as ``neighbor privacy''.

Therefore, the privacy of \textsf{Sectric} guarantees that for any subset of graph nodes $\vdv' \subset \vdv$ and all $u \in \vdv \setminus \vdv'$, the joint view of $\vdv'$ will not reveal any knowledge about $u$'s neighbor set $N_u$. It is also guaranteed that the server $\sdv$ will have no knowledge about any node $u$'s neighbor set $N_u$ during the execution of \textsf{Sectric}.

\begin{definition}[Privacy Constraint]

\label{definition: sectric privacy}

\textsf{Sectric} is a privacy-preserving protocol if there exist polynomial-time algorithms $\adv_{\sdv}$, $\adv_Q$ and $\adv_{\vdv'}$ such that the view of $\sdv$, the view of the querier $Q$, and the joint view of graph nodes $\vdv' \subset \vdv$ in the execution of \textsf{Sectric} can be simulated by $\adv_{\sdv}(1^\lambda, 1^{|\vdv|})$, $\adv_Q(1^\lambda, 1^{|\vdv|})$, and $\adv_{\vdv'}(1^\lambda, 1^{|\vdv|})$, respectively, where $\lambda$ is the security parameter. Here, ``view being simulated by an algorithm'' means that the view and the algorithm's output are computationally indistinguishable.

\textsf{Sectric} is a privacy-preserving protocol if there exist polynomial-time algorithms $\adv_{\sdv}$, $\adv_Q$, and $\adv_{\vdv'}$ such that the view of $\sdv$, the view of the querier $Q$, and the joint view of graph nodes $\vdv' \subset \vdv$ in the execution of \textsf{Sectric} can be simulated by $\adv_{\sdv}(1^\lambda, 1^{|\vdv|})$, $\adv_Q(1^\lambda, 1^{|\vdv|})$, and $\adv_{\vdv'}(1^\lambda, 1^{|\vdv|})$, respectively, where $\lambda$ is the security parameter. Here, ``view being simulated by an algorithm'' means that the view and the algorithm's output are computationally indistinguishable.

\end{definition}

\subsection{Cryptographic Tools}

In the following, we introduce the cryptographic tools used in this work.

\stitle{Random oracles.}

A random oracle is a theoretical construct used in the context of cryptography and complexity theory. It is defined by specifying query and image domains, and it responds with an element from the image domain for every query in the query domain.

A random oracle $\hdv$ responds with a uniformly random value for every newly appeared query, and a fixed value for repeated queries. Random oracles are typically implemented using cryptographic hash functions, such as SHA-2 or SHA-3.

\stitle{Multi-point OPRF (mpOPRF).}

A pseudorandom function (PRF) family is a cryptographic tool which emulates a random function family. This notion is formalized in Definition \ref{definition: pseudorandom function}. Given a uniform seed $s$, a pseudorandom function is computationally indistinguishable with a truely random function. It is usually implemented with symmetric encryption algorithms, such as AES, in practice.

A pseudorandom function (PRF) family is a cryptographic tool that emulates a random function family. This notion is formalized in Definition \ref{definition: pseudorandom function}. Given a uniform seed $s$, a pseudorandom function is computationally indistinguishable from a truly random function. It is usually implemented with symmetric encryption algorithms, such as AES, in practice.

\begin{definition}[PRF family]

A function family $\setpresentation{f_s}_{s \in \bin^*}$, where $f_s$ is a map $\bin^{|s|} \rightarrow \bin^{|s|}$ for all $s \in \bin^*$, is a PRF family if it satisfies the following conditions:

\begin{enumerate}

\item There exists a polynomial-time algorithm $F$ such that $F(s, x) = f_s(x)$ for all $\lambda \in \mathbb{N}$ and all $s, x \in \bin^\lambda$.

\item For any probabilistic polynomial-time algorithm $\adv$, $$|Pr[\adv^{f_s(\cdot)}(1^\lambda) | s \xleftarrow{\$} \bin^{\lambda}] - Pr[\adv^{f(\cdot)}(1^\lambda) | f \xleftarrow{\$} \fdv_{\lambda}]|$$

is negligible, where $\adv^{f(\cdot)}$ denotes $\adv$ can access oracle $f$ and $f \xleftarrow{\$} \fdv_{\lambda}$ denotes $f$ is a truly random function mapping $\bin^\lambda \rightarrow \bin^\lambda$.

is negligible, where $\adv^{f(\cdot)}$ denotes that $\adv$ can access oracle $f$ and $f \xleftarrow{\$} \fdv_{\lambda}$ denotes that $f$ is a truly random function mapping $\bin^\lambda \rightarrow \bin^\lambda$.

\end{enumerate}

\label{definition: pseudorandom function}

\end{definition}

Given a PRF family $\setpresentation{f_s}$, a mpOPRF protocol involves two parties and realizes the functionality $\fdv_\textsf{mpOPRF}$ defined in Algorithm \ref{algorithm: multi-point OPRF}. The sender and receiver specify a key $k$ and inputs $\vectorpresentation{x_i}_{i \in [n]}$, respectively. $\fdv_\textsf{mpOPRF}$ allows the receiver to learn the outcomes $\vectorpresen

Given a PRF family $\setpresentation{f_s}$, an mpOPRF protocol involves two parties and realizes the functionality $\fdv_\textsf{mpOPRF}$ defined in Algorithm \ref{algorithm: multi-point OPRF}. The sender and receiver specify a key $k$ and inputs $\vectorpresentation{x_i}_{i \in [n]}$, respectively. $\fdv_\textsf{mpOPRF}$ allows the receiver to learn the outcomes $\vectorpresentation

Les différences enregistrées

Texte d'origine

Ouvrir un fichier

\begin{abstract}
Graph data analysis, particularly local triangle counting, plays a pivotal role in deciphering complex relationships within graph data. This method is invaluable across diverse fields such as social networks, transportation, and cybersecurity. However, this process often involves handling sensitive information, necessitating that the relationship between any two nodes should be considered private. 
Differential privacy (DP) is a formal model to address the privacy concern and can be categorized into two kinds: the central DP (CDP) model, which achieves better result accuracy, and the local DP (LDP) model, which does not assume a trusted server. 
To bridge the gap between the two models, we propose {\textsf{Sectric}}, a \underline{se}rver-aided \underline{c}rypto-assisted \chadded{local} \underline{tri}angle \underline{c}ounting protocol, in this paper. 
It can achieve the same result accuracy with the same privacy budget as the CDP model without assuming a trusted server.
\textsf{Sectric} also explores a new approach in crypto-assisted graph data analysis algorithms that represents a node's neighbors using a set instead of \chreplaced{an}{a} adjacency vector, and successfully \chreplaced{achieves}{achieve} higher efficiency compared to other crypto-assisted solutions.
We also conduct theoretical and empirical evaluations to demonstrate that \textsf{Sectric} achieves the design principles.
\end{abstract}

\section{Introduction}
Graph data analysis is pivotal in unraveling complex relationships and patterns in graph data, making it a useful tool in various fields such as social networks~\cite{Backstrom2007WhereforeAT, Charbey2019StarsHO, Isaak2018UserDP, newman2009random}, transportation and logistics~\cite{Leskovec2007CosteffectiveOD, Pourhabibi2020FraudDA, Tan2019AGM, Weber2018ScalableGL}, and cybersecurity~\cite{Jernigan2009GaydarFF}.
In many applications, graph data \chreplaced{is}{are} stored in a decentralized manner~\cite{becchetti2008efficient}, where each node knows its neighbors, while no central node has the full graph topology. In this setting, \emph{local triangle counting}, which calculates the triangle counts containing a given node, is a fundamental graph analysis task. It is crucial for many downstream applications. For example, in decentralized social networks, the nodes will first perform a local triangle count to calculate the local clustering coefficient~\cite{newman2009random,becchetti2010efficient,green2014fast,li2017clustering} or local transitivity ratio~\cite{schank2005approximating,al2018triangle}, which reflects its importance in the network.

Privacy is another concern in the decentralized setting. Privacy-sensitive users \chreplaced{would not want}{will not hope} to reveal the identities of their neighbors to third parties. Differential privacy (DP) is a formal model addressing this privacy concern. 
Prior differentially private solutions can be mainly categorized into two kinds: adapting the central DP (CDP) model or the local DP model (LDP). 
The CDP model assumes a trusted server to calculate the triangle counts and adds a small noise to the final result\footnote{In the CDP model, the server knows the complete and accurate topology of the graph, including all nodes and the connection edges to their neighbors.}. The LDP model eliminates the trust assumption on the server, but has to add more noise in the calculating process with the same privacy budget. Thus, the CDP model has better result accuracy and the LDP model aligns better with the decentralized setting.

Following this framework, we design \textsf{Sectric}, a \underline{se}rver-aided \underline{c}rypto-assisted \underline{tri}angle \underline{c}ounting protocol. The first challenge we meet is the high overheads incurred by directly applying \chreplaced{existing}{existed} techniques. For example, if we directly \chreplaced{adapt}{adapting} techniques from CARGO~\cite{liu2024cargo} or MAGO~\cite{wang2023mago} to privacy-preserving local triangle counting, it will bring $O(|\vdv|^2)$ overheads, which are \chreplaced{impractical for}{unsatisfactory on} large graphs. We observe that the root of these adapted solutions' overheads lies in operating on the graph's adjacency matrix. 
To reduce the overheads, we explore \chreplaced{the usage of}{to use} adjacency sets, instead of adjacency vectors, to represent \chreplaced{node adjacencies}{node's neighbors}. To implement this idea, we design a novel cryptographic tool named \emph{Three-Party Private Set Membership Test (3PPSMT)}, rather than using secret sharing as CARGO and MAGO. This 3PPSMT primitive reduces the overheads of counting two graph nodes' common neighbors from $O(|\vdv|)$ to $O(d_{max})$ \chreplaced{compared}{comparing} to secret sharing. With this primitive, \textsf{Sectric} \chreplaced{introduces only}{only brings} $O(d_{max}|\vdv|)$ overheads in privacy-preserving local triangle counting. It makes \textsf{Sectric} more suitable for analyzing sparse graphs (i.e., $d_{max} = o(|\vdv|)$).

The main contributions of this work can be summarized as:

\begin{itemize}
	\item We design a novel local triangle counting protocol \textsf{Sectric} bridging the trust assumption and utility gap between the CDP and LDP models in the problem. Our solution shows that the same privacy and utility guarantee as the CDP can be achieved without requiring a trusted server.
	\item We explore a new approach in crypto-assisted graph data analysis algorithms that represents a node's neighbors using a set instead of an adjacency vector and reduces the overheads from $O(|\vdv|^2)$ to $O(d_{max}|\vdv|)$. We believe that this approach can also be adopted in other tasks to reduce the overheads and will further explore it in the future.
	%\item We define a novel cryptographic primitive $\fdv_\textsf{3PPSMT}$ and propose the $\Pi_\textsf{3PPSMT}$ protocol, an efficient implementation of this primitive. This primitive can also be applied to other privacy-aware graph data analysis tasks to avoid intermediate privacy costs.
	\item We perform a comprehensive theoretical and empirical analysis of \textsf{Sectric} to demonstrate its privacy guarantees and performance. We also adapt the open-source implementation of a state-of-the-art work~\cite{liu2024cargo,CARGO} to local triangle counting as the baseline.
\end{itemize}

\stitle{Paper Organization.}
The rest of this paper is organized as follows: In Section \ref{section: related work}, we discuss the related works. Then, we introduce the necessary preliminaries of this work in Section \ref{section: preliminaries}. In Section \ref{section: 3PPSMT}, we define the primitive $\fdv_\textsf{3PPSMT}$ and propose the $\Pi_\textsf{3PPSMT}$ protocol implementing this primitive. Based on this primitive, the construction of the \textsf{Sectric} protocol is presented in Section \ref{section: sectric construction}. Section \ref{section: implementation and evaluation} presents the experimental results and Section \ref{section: conclusion} concludes this paper.

\section{Related Work}
\label{section: related work}
\subsection{Privacy-Preserving Triangle Counting}
The problem of privacy-preserving triangle counting has been an active area of research. Existing works can be broadly categorized into two groups based on whether they assume the existence of a trusted server or not: centralized model-based approaches and decentralized model-based approaches.

A summary of the comparison between \textsf{Sectric} and other related works on triangle counting are presented in Table \ref{tab: related works}.

\subsection{Crypto-Assisted Graph Analytics}

Crypto-assisted solutions are also emerging in other graph analytics tasks.

\section{Preliminaries}
\label{section: preliminaries}

\subsection{Problem Definition}
\label{subsection: problem definition}

\stitle{Local Triangle Counting.}
Let $\gdv = (\vdv, \edv)$ be a graph, where $\vdv$ and $\edv$ represent the set of nodes and edges, respectively. Two nodes $u, v \in \vdv$ are adjacent if $(u, v) \in \edv$. We consider undirected graphs, so for any two nodes $u, v \in \vdv$, $(u, v) \in \edv$ if and only if $(v, u) \in \edv$. Without loss of generality, we assume the nodes in $\vdv$ are indexed from $1$ to $|\vdv|$.

\begin{definition}[Local triangle set]
	\label{definition: local triangle set}
	Given a graph $\gdv = (\vdv, \edv)$, the local triangle set of a node $u \in \vdv$ is defined as:
	$$
		\Delta_u = \setpresentation{\setpresentation{u, v, w} \subset \vdv: (u, v) , (u, w) , (v, w) \in \edv}.
	$$
\end{definition}

We also define the notion of a graph node's neighbor set in Definition \ref{definition: neighbor set}. The neighbor set $N_u$ of a node $u$ denotes the set of all nodes adjacent to $u$.

\begin{definition}[Neighbor set]
	\label{definition: neighbor set}
	Given a graph $\gdv = (\vdv, \edv)$, we define the neighbor set of a node $u \in \vdv$ as
	$$
		N_u = \setpresentation{v \in \vdv: (u, v) \in \edv}.
	$$
\end{definition}

The following theorem establishes the relationship between the local triangle sets and the neighbor sets, which serves as the foundation for our \textsf{Sectric} protocol.

\begin{theorem}
	\label{theorem: relation between neighbor and local triangle}
	Given a graph $\gdv = (\vdv, \edv)$, for any node $u \in \vdv$, we have
	$$
		|\Delta_u| = \frac{1}{2} \sum_{v \in N_u} |N_u \cap N_v|.
	$$
\end{theorem}

\begin{proof}\color{blue}
	We know that the number of triangles containing node $u$ is given by:

$$
|\Delta_u| = \sum_{v \in N_u} \sum_{w \in N_u, w > v} I(v, w),
$$

where $I(v, w)$ is an indicator function that equals 1 if there is an edge between $v$ and $w$, and 0 otherwise.

Additionally, we have:

$$
|N_u \cap N_v| = \sum_{w \in N_u} I(v, w).
$$

Thus, we can express the sum as:

$$
\sum_{v \in N_u} |N_u \cap N_v| = \sum_{v \in N_u} \sum_{w \in N_u} I(v, w).
$$

In the calculation process, the positions of $v$ and $w$ are equivalent and interchangeable. Therefore, we can conclude that:

$$
|\Delta_u| = \sum_{v \in N_u} \sum_{w \in N_u, w > v} I(v, w) = \frac{1}{2} \sum_{v \in N_u} |N_u \cap N_v|.
$$
\end{proof}

\stitle{System Model.}
We design \textsf{Sectric} in a server-aided paradigm, where a server $\sdv$ is introduced. The \textsf{Sectric} system model is illustrated in Fig. \ref{figure: Sectric system example}. The system participants include the graph nodes and the server $\sdv$. The left side of the figure depicts the graph, where the solid lines represent the adjacency relations between nodes (i.e., the edges). The right side shows the server $\sdv$. The dashed lines between the graph nodes and $\sdv$ indicate the communication model, where every node has a communication channel with the server $\sdv$.

\stitle{Threat Model and Privacy Constraint.}
In this work, we consider a commonly adopted semi-honest model. This model assumes that the graph nodes and the server $\sdv$ will follow the \textsf{Sectric} protocol, but may attempt to extract additional knowledge about a given node's neighbor set from their view (ref. Definition \ref{definition: protocol participants' view}). In \textsf{Sectric}, we allow graph nodes to collude, but require that the server does not collude with any graph node.

\begin{definition}[Participants' View]
	\label{definition: protocol participants' view}
	Suppose a multiparty computation protocol $\Pi$ is executed by a set of participants $P_1$, \dots, $P_n$. The view of $P_i$ in the execution is defined as $$VIEW_i = \setpresentation{r_i, \mdv_i},$$
	where $r_i$ is the randomness used by $P_i$ in the execution and $\mdv_i$ is all the messages it receives.
\end{definition}

\begin{definition}[Privacy Constraint]
	\label{definition: sectric privacy}
	\textsf{Sectric} is a privacy-preserving protocol if there exist polynomial-time algorithms $\adv_{\sdv}$, $\adv_Q$ and $\adv_{\vdv'}$ such that the view of $\sdv$, the view of the querier $Q$, and the joint view of graph nodes $\vdv' \subset \vdv$ in the execution of \textsf{Sectric} can be simulated by $\adv_{\sdv}(1^\lambda, 1^{|\vdv|})$, $\adv_Q(1^\lambda, 1^{|\vdv|})$, and $\adv_{\vdv'}(1^\lambda, 1^{|\vdv|})$, respectively, where $\lambda$ is the security parameter. Here, ``view being simulated by an algorithm'' means that the view and the algorithm's output are computationally indistinguishable.
\end{definition}

\subsection{Cryptographic Tools}

In the following, we introduce the cryptographic tools used in this work.

\stitle{Random oracles.}
A random oracle is a theoretical construct used in the context of cryptography and complexity theory. It is defined by specifying query and image domains, and it responds with an element from the image domain for every query in the query domain. 
A random oracle $\hdv$ responds with a uniformly random value for every newly appeared query, and a fixed value for repeated queries. Random oracles are typically implemented using cryptographic hash functions, such as SHA-2 or SHA-3.

\stitle{Multi-point OPRF (mpOPRF).}
A pseudorandom function (PRF) family is a cryptographic tool which emulates a random function family. This notion is formalized in Definition \ref{definition: pseudorandom function}. Given a uniform seed $s$, a pseudorandom function is computationally indistinguishable with a truely random function. It is usually implemented with symmetric encryption algorithms, such as AES, in practice.

\begin{definition}[PRF family]
	A function family $\setpresentation{f_s}_{s \in \bin^*}$, where $f_s$ is a map $\bin^{|s|} \rightarrow \bin^{|s|}$ for all $s \in \bin^*$, is a PRF family if it satisfies the following conditions:
	\begin{enumerate}
		\item There exists a polynomial-time algorithm $F$ such that $F(s, x) = f_s(x)$ for all $\lambda \in \mathbb{N}$ and all  $s, x \in \bin^\lambda$.
		\item For any probabilistic polynomial-time algorithm $\adv$, $$|Pr[\adv^{f_s(\cdot)}(1^\lambda) | s \xleftarrow{\$} \bin^{\lambda}] - Pr[\adv^{f(\cdot)}(1^\lambda) | f \xleftarrow{\$} \fdv_{\lambda}]|$$
		      is negligible, where  $\adv^{f(\cdot)}$ denotes $\adv$ can access oracle $f$ and $f \xleftarrow{\$} \fdv_{\lambda}$ denotes $f$ is a truly random function mapping $\bin^\lambda \rightarrow \bin^\lambda$.
	\end{enumerate}
	\label{definition: pseudorandom function}
\end{definition}

\stitle{Oblivious Key-Value Store (OKVS).}
The notion of oblivious key-value store (OKVS) is first introduced by Garimella et al.~\cite{garimella2021oblivious} in the context of private set intersection (PSI). OKVS allows one to encode a set of key-value pairs into an encoding, and ensures that the original key-value pairs generating the encoding cannot be recovered from the encoding given that the encoded values are uniformly random.

\begin{definition}[Key-Value Store~\cite{garimella2021oblivious}]
	A key-value store (KVS) is defined by specifying the key space $\kdv$ and the value space $\vdv$, together with two algorithms:
	\begin{enumerate}
		\item $S \leftarrow E(A; R)$: The encoding algorithm takes a list of $n$ key-value pairs $A = \vectorpresentation{(k_i, v_i)}_{i \in [n]} \subset \kdv \times \vdv$ with distinct keys and the randomness $R$ as inputs. It outputs an encoding $S \in \vdv^m \cup \setpresentation{\perp}$.
		\item $v \leftarrow D(S, k)$: The decoding algorithm takes the encoding $S \in \vdv^m$ and a key $k \in \kdv$ as the inputs. It outputs a value $v \in \vdv$. 
	\end{enumerate}
	\label{definition: okvs syntax}
\end{definition}

\begin{definition}[Expansion ratio]
	Given a KVS scheme $\Pi$, if the encoding $S$ storing $n$ key-value pairs satisfies $S \in \vdv^m \cup \setpresentation{\perp}$, then the expansion ratio of $\Pi$ is $\frac{m}{n}$.
\end{definition}

\begin{definition}[Obliviousness]
	\label{definition: okvs obliviousness}
	A KVS $\Pi = (E, D)$ defined on the key space $\kdv$ and value space $\vdv$ satisfies the condition of \emph{obliviousness} if, for any two lists of $n$ distinct keys $\vectorpresentation{k_i}_{i \in [n]}$ and $\vectorpresentation{k'_i}_{i \in [n]}$, and $n$ values $\vectorpresentation{v_i}_{i \in [n]} \xleftarrow{\$} \vdv^n$ drawn uniformly at random from $\vdv^n$, and for any polynomial-time algorithm $\adv$,
	$$|Pr[\adv(S) = 1] - Pr[\adv(S') = 1]|$$
	is negligible, where
	$$S \leftarrow E(\vectorpresentation{(k_i, v_i)}_{i \in [n]}) \text{ and } S' \leftarrow E(\vectorpresentation{(k'_i, v_i)}_{i \in [n]}).$$
	In other words, the distributions of $S$ and $S'$ are computationally indistinguishable.
\end{definition}

\stitle{Private Equality Test.}
Private equality test is a two-party protocol that allows the two parties to test whether their inputs are equal and share the result with each other. The functionality of this protocol is formally defined as $\fdv_\textsf{EQ}$ stated in Algorithm \ref{algorithm: private equality test functionality}. The state-of-the-art implementation of this functionality is proposed by Chandran et al.~\cite{chandran2022circuit}.

\section{Three-Party Private Set Membership Test}
\label{section: 3PPSMT}

\subsection{Primitive Definition}
We first define the 3PPSMT primitive, which involves three parties and realizes the functionality $\fdv_\textsf{3PPSMT}$, which is formally described in Algorithm \ref{algorithm: 3PPSMT functionality}.

The 3PPSMT functionality $\fdv_\textsf{3PPSMT}$ is a three-party functionality that allows the querier $Q$ to query whether an element is in the set provided by a set provider $U$. It receives a set $X$ from the set provider $U$ and an element $u$ from the querier $Q$. The set $X$ and the element $u$ are selected from a universe $\udv$. The result, indicating whether $u \in X$ or not, is secret-shared between the querier $Q$ and the server $\sdv$.

\subsection{Construction}
\label{subsection: 3PPSMT construction}

In this section, we propose the $\Pi_\textsf{3PPSMT}$ protocol construction, which implements the 3PPSMT functionality $\fdv_\textsf{3PPSMT}$ presented in Algorithm \ref{algorithm: 3PPSMT functionality}.

The $\Pi_\textsf{3PPSMT}$ protocol is presented in Algorithm \ref{algorithm: 3PPSMT protocol}. In this protocol, the querier $Q$ inputs an element $u \in \bin^\lambda$ and the set provider $U$ inputs a set $X$. This protocol targets to test whether $u \in X$ and secret-shares the output between the querier $Q$ and a server $\sdv$.

The $\Pi_\textsf{3PPSMT}$ protocol consists of a querier-independent preprocessing phase and a querier-involved online phase. In the preprocessing phase, the server $\sdv$ and the set provider $U$ need not interact with the querier $Q$. The querier $Q$ and the query $u$ are only involved in the online phase.

In the preprocessing phase, $U$ first samples a PRF key $k_U$ and maps the set elements $x_i \in X$ to $y_i = f_{k_U}(x_i)$ in Step 1. Then, it encodes the key-value pairs $\vectorpresentation{(x_i, y_i)}_{i \in [|X|]}$ into an OKVS $S_U$. Once receiving $y_i$ from $U$, $\sdv$ samples a random number $r$ from $\bin^\lambda$ and another PRF key $k_\sdv$. In the following, it encodes the key-value pairs $\vectorpresentation{(y_i, (f_{k_\sdv}(y_i) + r \text{ mod } 2^\lambda))}_{i \in [|X|]}$ into another OKVS $S_\sdv$ in Step 3.

In the online phase, $\sdv$ and $U$ first send $S_\sdv$ and $S_U$ to $Q$, respectively, in Step 4.
After receiving $S_U$ and $S_\sdv$, $Q$ decodes them to obtain $y = \Pi.D(S_U, u)$ and $\hat{y} = \Pi.D(S_\sdv, y)$ in Step 5.
In Step 6 and Step 7, $\sdv$ and $Q$ invoke $\fdv_\textsf{mpOPRF}$ and $Q$ has $y' = f_{k_\sdv}(y)$. $Q$ then evaluates $r' = \hat{y} - y'$.
Finally, $\sdv$ and $Q$ invoke $\fdv_\textsf{EQ}$ to test whether $r = r'$.

We note that, supposing $u \in X$ and $u = x_{i^*}$ for some $i^*$, we have that $$y = \Pi.D(S_U, u) = \Pi.D(S_U, x_{i^*}) = y_{i^*},$$ $$ \hat{y} = \Pi.D(S_\sdv, y) = \Pi.D(S_\sdv, y_{i^*}) = f_{k_\sdv}(y_{i^*}) + r \text{ mod } 2^\lambda.$$
So, $$r' = \hat{y} - y' = f_{k_\sdv}(y_{i^*}) + r \text{ mod } 2^\lambda - f_{k_\sdv}( y_{i^*}) = r$$ if $u \in X$.
Therefore, if $u \in X$, $\sdv$ and $Q$ will secret-share the result that $r' = r$ in the invocation of $\fdv_\textsf{EQ}$.

In the contrary, supposing that $u \notin X$, the decoding result $\hat{y}$ will not equal $f_{k_\sdv}(y_{i^*}) + r$ (with overwhelming probability). Thus, the resulting $r'$ is not equal to $r$, and $\sdv$ and $Q$ will secret-share the result that $r' \neq r$ in the invocation of $\fdv_\textsf{EQ}$.

\subsection{Protocol Analysis}
We first analyze the communication and computation complexity of the $\Pi_\textsf{3PPSMT}$ protocol in Theorem \ref{theorem: 3ppsmt protocol communication complexity} and \ref{theorem: 3ppsmt protocol computation complexity}.

\begin{theorem}
	Supposing that the OKVS scheme $\Pi$ has expansion ratio $1 + \varepsilon$, the communication complexity of the $\Pi_\textsf{3PPSMT}$ protocol is $O(\lambda |X|)$ in the preprocessing phase and $O((1 + \varepsilon)\lambda |X|)$ in the online phase.
	\label{theorem: 3ppsmt protocol communication complexity}
\end{theorem}

\begin{proof}
	In the preprocessing phase, the set provider $U$ sends $f_{k_U}(x_i)$ to the server $\sdv$ for all $x_i \in X$. For each $f_{k_U}(x_i)$, it requires $O(\lambda)$ communication complexity to send it. Thus, the preprocessing phase has a total communication complexity of $O(\lambda |X|)$.

In the online phase, $U$ sends an OKVS encoding $S_U$ to $Q$ and $\sdv$ sends an OKVS encoding $S_\sdv$ to $Q$ in Step 4. Each OKVS encoding stores $|X|$ key-value pairs, where each value is in the space $\bin^\lambda$. Since the OKVS scheme $\Pi$ has expansion ratio $1 + \varepsilon$, the encodings' size are both $(1 + \varepsilon) \lambda |X|$. Then, in the invocation of $\fdv_{\textsf{mpOPRF}}$ and $\fdv_\textsf{EQ}$, the communication complexity is $O(\lambda)$. Therefore, the communication complexity of the online phase is $O((1 + \varepsilon)\lambda |X|)$.
\end{proof}

\begin{theorem}
	Supposing that the OKVS scheme has expansion ratio $1 + \varepsilon$, the computation complexity of the set provider $U$ is $O((1 + \varepsilon)\lambda |X|)$ in the preprocessing phase (server and querier have no computation in the preprocessing phase) and $O((1 + \varepsilon) \lambda |X|)$ for both the server and the querier in the online phase.
	\label{theorem: 3ppsmt protocol computation complexity}
\end{theorem}
\begin{proof}
	The computation complexity in the preprocessing phase involves:
	\begin{itemize}
		\item The set provider $U$ evaluates $f_{k_U}(x_i)$ for all $x_i \in X$, which has $O(\lambda |X|)$ computation complexity.
		\item The OKVS encoding operation stores $|X|$ key-value pairspairs, which has $O((1 + \varepsilon)\lambda|X|)$ computation complexity.
	\end{itemize}
	Therefore, the preprocessing phase of the $\Pi_\textsf{3PPSMT}$ protocol has $O((1 + \varepsilon) \lambda |X|)$ computation complexity for the set provider.

The computation complexity in the online phase involves:
	\begin{itemize}
		\item The OKVS encoding operation stores $|X|$ key-value pairs in Step 3, which has $O((1 + \varepsilon)\lambda|X|)$ computation complexity for the server.
		\item The OKVS decoding operations has $O((1 + \varepsilon) \lambda |X|)$ computation complexity for the querier.
		\item The invocation of $\fdv_\textsf{mpOPRF}$ and $\fdv_\textsf{EQ}$ has $O(\lambda)$ computation complexity for both the server and the querier.
	\end{itemize}
	Therefore, the online phase of the $\Pi_\textsf{3PPSMT}$ protocol has $O((1 + \varepsilon) \lambda |X|)$ computation complexity for both the server and the querier.
\end{proof}

In the following, we analyze the privacy of the $\Pi_\textsf{3PPSMT}$ protocol. Theorem \ref{theorem: 3PPSMT protocol privacy} states that the protocol securely implements the $\fdv_\textsf{3PPSMT}$ functionality in the semi-honest threat model. In the proof of this theorem, we demonstrate that no protocol participant can extract additional knowledge on other participants' private inputs from its view in the execution of the $\Pi_\textsf{3PPSMT}$ protocol. Specifically, we prove that the querier $Q$'s query $u$ is revealed to the server $\sdv$ and the set provider $U$, and no knowledge on $U$'s set $X$ besides its size is revealed to $\sdv$ and $Q$.

\begin{theorem}
	The $\Pi_\textsf{3PPSMT}$ protocol presented in Algorithm \ref{algorithm: 3PPSMT protocol} securely implements the $\fdv_\textsf{3PPSMT}$ functionality defined in Algorithm \ref{algorithm: 3PPSMT functionality} against semi-honest polynomial-time adversaries, given that $\sdv$, $U$ and $Q$ are non-collusive.
	\label{theorem: 3PPSMT protocol privacy}
\end{theorem}
\begin{proof}
	We prove this theorem by demonstrating that:
	\begin{enumerate}
		\item $\sdv$ cannot extract any additional knowledge on $Q$'s query $u$ and $U$'s set $X$ besides $b_\sdv$ and $|X|$ from its view.
		\item $Q$ cannot extract any additional knowledge on $U$'s set $X$ besides $b_Q$ and $|X|$ from its view.
		\item $U$ cannot extract any knowledge from its view.
	\end{enumerate}

To show that one participant cannot extract additional knowledge from its view, we denote the construction in Algorithm \ref{algorithm: 3PPSMT protocol} as $Hybrid_0$ and define a sequence $Hybrid_1, Hybrid_2,...$ through continuously modifying the original construction. \chreplaced{We also prove that the participant's views in these hybrids are computationally indistinguishable from each other}{We also prove that the participant's view in these hybrids is computationally indistinguishable} and its view in the last hybrid can be easily simulated without any additional knowledge.

In the following, we demonstrate the above statements:
	\begin{enumerate}
		\item $\sdv$ cannot extract any additional knowledge on $Q$'s query $u$ and $U$'s set $X$ besides $b_\sdv$ and $|X|$ from its view.
		      \begin{enumerate}
			      \item $Hybrid_1$: We modify $Hybrid_0$ by replacing $y_i$ in Step 2 with random numbers. $\sdv$'s view in $Hybrid_1$ is computationally indistinguishable from its view in $Hybrid_0$ due to the property of pseudorandom functions.
		      \end{enumerate}
		      $\sdv$'s view in $Hybrid_1$ consists of $|X|$ random values from $U$ in Step 2, its view from $\fdv_\textsf{mpOPRF}$ in Step 7 which has no output, and $b_\sdv$ from $\fdv_\textsf{EQ}$ in Step 8. This view can be easily simulated given $|X|$ and $b_\sdv$.

\item $Q$ cannot extract any additional knowledge on $U$'s set $X$ besides $b_Q$ and $|X|$ from its view.
		      \begin{enumerate}
			      \item $Hybrid_1$: We modify $Hybrid_0$ by replacing $y_i$ in Step 2 and $f_{k_\sdv}(y_i)$ in Step 3 with random numbers. $Q$'s view in $Hybrid_1$ is computationally indistinguishable from its view in $Hybrid_0$ due to the property of pseudorandom functions.
			      \item $Hybrid_2$: We modify $Hybrid_1$ by replacing $x_i$ in Step 2 and $y_i$ in Step 3 with random numbers. $Q$'s view in $Hybrid_2$ is computationally indistinguishable from its view in $Hybrid_1$ due to the obliviousness of the OKVS scheme by Definition \ref{definition: okvs obliviousness}.
			      \item $Hybrid_3$: We modify $Hybrid_2$ by replacing $y'$ in Step 7 with a random number. $Q$'s view in $Hybrid_3$ is computationally indistinguishable from its view in $Hybrid_2$ due to the property of pseudorandom functions.
		      \end{enumerate}
		      $Q$'s view in $Hybrid_3$ consists of two OKVS encodings Step 4 which store $|X|$ key-value pairs with random keys and random values, a random number in Step 7, and $b_Q$ from $\fdv_\textsf{EQ}$ in Step 8. This view can be easily simulated given $|X|$ and $b_Q$.

\item $U$ cannot extract any knowledge from its view.
		      This statement can be proven by observing that $U$ does not receive any message in the $\Pi_\textsf{3PPSMT}$ protocol.
	\end{enumerate}
	The above analysis shows that the $\Pi_\textsf{3PPSMT}$ protocol in Algorithm \ref{algorithm: 3PPSMT protocol} securely implements the $\fdv_\textsf{3PPSMT}$ functionality.
\end{proof}

\section{The \textsf{Sectric} Protocol}
\label{section: sectric construction}

\subsection{Intuitive Construction}
\label{subsection: sectric construction overview}

In this part, we first provide an intuitive construction of the \textsf{Sectric} protocol. We call this construction ``intuitive'' because our privacy constraint is to preserve the neighbor privacy of all graph nodes, but this intuitive construction only preserves all nodes' neighbor privacy besides the querier $Q$'s. It reveals the identities of the querier's neighbors to the server.

We propose this intuitive construction because we believe that it captures the core idea of our full construction and is easy to follow. We believe that describing this intuitive protocol can ease the understanding of our full construction, compared to directly presenting the full secure protocol.

We will explain the reason why this intuitive protocol provides neighbor privacy for other nodes and why it reveals the identities of the querier's neighbors after describing it.

\stitle{Protocol description.}
\textsf{Sectric} is executed on a decentralized graph. The node set is public and the edge set is distributed among all graph nodes, where each graph node has its neighbor set. The protocol involves the server $\sdv$ and graph nodes $\vdv$ as the protocol participants. The server $\sdv$ has no input and each graph node $u \in \vdv$ has its neighbor set $N_u$ as the input.

Suppose that a graph node $Q \in \vdv$ initiates a local triangle counting task and queries the number of its local triangles. To fulfill this task, $\sdv$ and $Q$ first target at secret-sharing $|N_Q \cap N_u|$ for each $u \in N_Q$. To achieve this, they test for each $v \in N_Q$ whether $v \in N_u$ and secret-share the result. This task can be done through invoking the 3PPSMT functionality $\fdv_\textsf{3PPSMT}$.

Therefore, the intuitive protocol works as follows. Given a neighbor $u \in N_Q$, $\sdv$, $Q$ and $u$ invoke the 3PPSMT functionality $\fdv_\textsf{3PPSMT}$. They act as the server, the querier and the set provider, respectively, for each $v \in N_Q$, where $Q$ and $u$ use $v$ and $N_u$ as the inputs. In each invocation, $\sdv$ and $Q$ obtain the outputs. This step tests whether $v$ is in $N_u$ for all $v \in N_Q$, and $\sdv$ and $Q$ share the result. Aggregating the secret-shares of the results, $Q$ and $\sdv$ secret-share $|N_Q \cap N_u|$ for this neighbor $u$. $\sdv$ and $Q$ repeat the above procedure for all $u \in N_Q$ and secret-share $|N_Q \cap N_u|$ for all $u \in N_Q$. Finally, $\sdv$ and $Q$ sum the secret-shares, respectively. By Theorem \ref{theorem: relation between neighbor and local triangle}, they obtain a secret share of $$ \frac{1}{2}\sum_{u \in N_Q} |N_Q \cap N_u| = |\Delta_Q|.$$ $\sdv$ sends its share to $Q$ and then $Q$ can recover the result $|\Delta_Q|$.

\stitle{Privacy for querier's neighbors.} We first briefly analyze why this intuitive construction provides privacy to the querier's neighbors. Suppose that the $\fdv_\textsf{3PPSMT}$ functionality is securely implemented by a protocol $\Pi_\textsf{3PPSMT}$ in the universal composability model. We note that the view of $\sdv$ in this protocol is identical to its view in the instances of $\Pi_\textsf{3PPSMT}$. Due to the property of universal composability, the parallel composition of multiple $\Pi_\textsf{3PPSMT}$ instances is also secure. So, no knowledge of the participants' neighbor sets is revealed to $\sdv$. This fact also holds for $Q$'s neighbors in $N_Q$. For $Q$, the situation is somewhat different. Its view consists of its view in $\Pi_\textsf{3PPSMT}$ instances and the secret-share $b_\sdv$ of $|\Delta_Q|$ from $\sdv$. This does not alter the protocol's privacy requirement, as $b_\sdv$ can be obtained from $b_Q$ and $|\Delta_Q|$, which are both known to $Q$.

\stitle{Insecurity of the intuitive protocol.} The reason that this intuitive construction reveals the querier's neighbors lies in the fact that the server $\sdv$ can tell $Q$'s neighbors by observing which graph nodes it interacts with in the instances of $\Pi_\textsf{3PPSMT}$.

In Section \ref{subsection: secure composition}, we provide a technique to securely compose multiple instances of the 3PPSMT protocol to fix the above insecurity.

\subsection{Full Protocol from Secure Composition}
\label{subsection: secure composition}

In the following, we describe the full construction of the  protocol. This construction securely fulfills the privacy-preserving local triangle counting task.

Recall that multiple instances of the $\Pi_\textsf{3PPSMT}$ protocol are invoked in Step 2-6 in the intuitive construction in Algorithm \ref{algorithm: Sectric protocol}. In each instance, the querier $Q$ specifies a node $u \in \vdv$ and an element $v \in N_Q$, and tests whether $v \in N_u$.

In this full construction, we mainly demonstrate how to securely compose these $\Pi_\textsf{3PPSMT}$ protocol instances, so that the server $\sdv$ does not interact with $Q$'s neighbors. Integrating this secure composition technique into the intuitive protocol in Algorithm \ref{algorithm: Sectric protocol}, we obtain the full construction of \textsf{Sectric}.

We model the process as a group of set providers (i.e. the graph nodes $\vdv$) each of which provides a set (i.e. $u \in \vdv$ provides $N_u$). The querier has $m$ queries, each of which specifies a set provider (i.e. $u \in N_Q$) and an element (i.e. $v \in N_Q / \setpresentation{u}$). Assisted by the server, the querier tests for each query whether the element is in the corresponding provider's set. The privacy requirement is that the set provider in each query is not revealed.

As stated before, naively composing multiple instances of $\Pi_\textsf{3PPSMT}$ in parallel reveals the set providers specified by $Q$ to $\sdv$ because the server has to interact with them.
In the following, we describe how the secure composition technique fixes such leakage. Corresponding to the original $\Pi_\textsf{3PPSMT}$ protocol, we describe the composition of the preprocessing phase and the online phase respectively.

\stitle{Composition of the preprocessing phase.} In order to compose the preprocessing phase, we first observe Theorem \ref{theorem: reusability of 3PPSMT preprocessing}, which states that the preprocessing phase of the $\Pi_\textsf{3PPSMT}$ protocol satisfies the condition of reusability.

\begin{theorem}
	Given that the $\fdv_\textsf{mpOPRF}$ and $\fdv_\textsf{EQ}$ functionalities are securely implemented, the preprocessing phase of the $\Pi_\textsf{3PPSMT}$ protocol presented in Algorithm \ref{algorithm: 3PPSMT protocol} satisfies the condition of reusability. In other words, the protocol supports polynomially many queries invoking the online phase with independent randomness after a single preprocessing.
	\label{theorem: reusability of 3PPSMT preprocessing}
\end{theorem}
\begin{proof}
    \chreplaced{From the protocol construction, the server and the querier's views in multiple invocations of the online phase are independent}{From the protocol construction, the server and the querier's view in multiple invocations of the online phase is independent}
	given that the randomness $r$ is newly selected in each invocation. Thus, they cannot obtain more information on the set provider's set in parallel invocations of the $\Pi_\textsf{3PPSMT}$ protocol due to the universal composability of the $\fdv_\textsf{mpOPRF}$ and $\fdv_\textsf{EQ}$ functionalities. Therefore, the $\Pi_\textsf{3PPSMT}$ protocol satisfies the condition of reusability.
\end{proof}

This theorem inspires the composition of the preprocessing phase. Specifically, we let all set providers run the preprocessing phase of $\Pi_\textsf{3PPSMT}$ in parallel and send preprocessing results to the server $\sdv$. With these preprocessing results, the querier can obtain the testing results by interacting with the server using the online phase. Therefore, the server no longer has to interact with the queried set providers.

\stitle{Composition of the online phase.} Then, we consider how to compose the online phase with the preprocessing results. As the server has possessed all preprocessing results, it can respond to the querier using the preprocessing result corresponding to the queried set provider. However, the querier still has to let the server know the queried set provider.

To fix this problem, we have a key observation that the querier $Q$ knows the queried set providers $\setpresentation{v_i}_{i \in [m]}$. Thus, $Q$ can ask $v_i$ for $S_{v_i} \leftarrow \Pi.E(\vectorpresentation{(x', f_{k_u}(x'))}_{x' \in N_{v_i}})$ as Step 4 in Algorithm \ref{algorithm: 3PPSMT protocol}. After obtaining these OKVS encodings, $Q$ decodes $S_{v_i}$ on the queried element $x_i$ for each query $(v_i, x_i)$. The decoding result $\Pi.D(S_{v_i}, x_i) = f_{k_{v_i}}(x_i)$ supposing $x_i \in N_{v_i}$ and otherwise is a pseudorandom number.

With the decoding results, the problem is reduced to how to test whether each decoding result $\Pi.D(S_{v_i}, x_i)$ is in the PRF values $\vectorpresentation{f_{k_{v_i}}(x)}_{x \in N_{v_i}}$ sent by the corresponding set provider $v_i$. The key challenge is how to specify $\vectorpresentation{f_{k_{v_i}}(x)}_{x \in N_{v_i}}$ among all the PRF values received by the server without revealing the exact identity of $v_i$ to the server.

We address this challenge by observing that due to the pseudorandomness of PRF values, the PRF values from different set providers make a collision with only negligible probability. Thus, the querier only has to test whether each decoding result is in all the PRF values possessed by the server. This task can be directly fulfilled using the technique from the $\Pi_\textsf{3PPSMT}$ protocol's online phase in Algorithm \ref{algorithm: 3PPSMT protocol}.

\stitle{Further optimization.} The weakness of the above solution is the large overheads. In the preprocessing phase, the server $\sdv$ totally receives $O(\sum_{u \in \vdv} |N_u|) = O(|\edv|)$ PRF values. The proofs of Theorem \ref{theorem: 3ppsmt protocol communication complexity} and Theorem \ref{theorem: 3ppsmt protocol computation complexity} \chreplaced{indicate}{indicates} that the communication overhead is $O((1 + \varepsilon)\lambda |\edv|)$ for each query in the above solution's online phase, so the communication overhead is totally $O((1 + \varepsilon)m \lambda |\edv|)$ in the online phase. Also, the total computation overhead is $O((1 + \varepsilon) m \lambda |\edv|)$ for both the server and the querier.

To reduce the overheads, we integrate the technique of cuckoo hash tables. The decoding results are first mapped to different entries of the querier-side cuckoo hash table, and the PRF values are mapped to all possible entries in the server-side table. Then, the membership test is only applied to the decoding result and the PRF values in the same entry of querier-side and server-side tables.

Through this method, we reduce the online phase's communication overhead from $((1 + \varepsilon)m \lambda |\edv|)$ to $O((1 + \varepsilon) \lambda |\edv|)$, and reduce the computation overheads from $O((1 + \varepsilon) m \lambda |\edv|)$ to $O((1 + \varepsilon) \lambda |\edv| )$, and $O((1 + \varepsilon)\lambda\gamma m)$, for the server, and the querier, respectively. More details can be found in Theorem \ref{theorem: secure composition communication} and Theorem \ref{theorem: secure composition computation}.

Combining the above techniques, the protocol implementing the secure composition of multiple $\Pi_\textsf{3PPSMT}$ instances is presented in Algorithm \ref{algorithm: secure composition}. Integrating this protocol into the intuitive construction of \textsf{Sectric} in Algorithm \ref{algorithm: Sectric protocol} to replace Step 3-6, we obtain the full construction of \textsf{Sectric}.

\stitle{Differential Privacy.}  In some scenarios, directly publishing local triangle counts may itself pose a risk. Although our Sectric protocol focuses on process-level security, it can be easily extended to a differential privacy version to ensure security in such scenarios. At the end of the protocol, both the query requester and the server hold a share of the result. The server only needs to compute the noise and add it to its share, and the result recovered by the query requester will naturally be noisy. The specific parameters for adding noise can be referred to in~\cite{liu2024cargo}. First, the server obtains the noisy degree as follows: $ d' = d + \text{Lap}(1/\epsilon_1) $, and then it computes the noise as $ \text{Lap}(d'/\epsilon_2) $ and add it to the share, where $ \epsilon_1 $ and $ \epsilon_2 $ represent the privacy budgets \chadded{and Lap refers to the Laplace distribution function}.

\subsection{Theoretical Analysis}
\label{subsection: theoretical analysis of secure composition}

In the following, we analyze the proposed protocols from the theoretical perspective. We first demonstrate the communication and computation cost of invoking multiple instances of $\Pi_\textsf{3PPSMT}$ using the secure composition technique proposed in Algorithm \ref{algorithm: secure composition}.

\begin{theorem}
	Given that the OKVS scheme has expansion ratio $1 + \varepsilon$, the communication complexity of the composed protocol in Algorithm \ref{algorithm: secure composition} is $O(\lambda |\edv|)$ in the preprocessing phase and $O((1 + \varepsilon) \lambda |\edv| )$ in the online phase.
	\label{theorem: secure composition communication}
\end{theorem}
\begin{proof}
	In the preprocessing phase, the communication mainly occurs in Step 3 where each $u \in \vdv$ sends all PRF values $y_{u, i}$ to $\sdv$. There are a total of $\sum_{u \in \vdv} |N_u| = |\edv|$ PRF values of size $O(\lambda)$. Thus, the communication complexity of the preprocessing phase is $O(\lambda |\edv|)$.

In the online phase, the communication mainly occurs in Step 12, where $\sdv$ sends all $S_{\sdv, j}$ to $Q$ and for each query $(v_i, x_i)$, $v_i$ sends $S_{v_i}$ to $Q$. In the former, $S_{\sdv, j}$ stores $l_j$ key-value pairs with value size $O(\lambda)$. So, $S_{\sdv, j}$ has size $(1 + \varepsilon) \lambda  l_j$. Noting that all entries of $T$ store $O(\sum_{u \in \vdv} |N_u|) = O(|\edv|)$ PRF values, these OKVS encodings \chreplaced{have}{has} a total size of $O(\sum (1 + \varepsilon) \lambda l_j ) = O((1 + \varepsilon)  \lambda |\edv|)$. In the latter, each OKVS encoding $S_{v_i}$ stores $|N_{v_i}|$ key-value pairs with value size $O(\lambda)$. So, $S_{v_i}$ has size $O((1 + \varepsilon)\lambda |N_{v_i}|)$. Noting that for repeated $v_i$, $Q$ only has to ask for $S_{v_i}$ once. Thus, it requires $O(\sum_{u \in \vdv} S_u) = O(\sum_{u \in \vdv} (1 + \varepsilon) \lambda |N_u| ) = O((1 + \varepsilon) \lambda |\edv| )$ communication complexity to send $S_{v_i}$ for all queries $(v_i, x_i)$. Therefore, the communication complexity in the online phase is $O((1 + \varepsilon) \lambda |\edv| )$.
\end{proof}

\begin{theorem}
	\label{theorem: secure composition computation}
	Assuming the OKVS scheme has expansion ratio $1 + \varepsilon$, the computation complexity of the secure composition technique in Algorithm \ref{algorithm: secure composition} is $O((1 + \varepsilon) \lambda |N_u|)$ for graph node $u \in \vdv$ and $O(\lambda |\edv|)$ for the server $\sdv$ in the preprocessing phase. In the online phase, the computation complexity is $O((1 + \varepsilon) \lambda |\edv|)$ for the server $\sdv$ and $O((1 + \varepsilon)\lambda\gamma m)$ for the querier $Q$.
\end{theorem}
\begin{proof}
	In the preprocessing phase, each graph node $u \in \vdv$ evaluates all $f_{k_u}(x_i)$ for all $x_i \in N_u$, requiring $O(\lambda |N_u|)$ computation complexity, and stores $|N_u|$ key-value pairs with value size $O(\lambda)$ in the OKVS encoding $S_u$, requiring $O((1 + \varepsilon)\lambda |N_u|)$ computation cost. Thus, the computation complexity for graph node $u$ is $O((1 + \varepsilon)\lambda |N_u|)$.

The server $\sdv$ has $O(\sum_{u \in \vdv} \lambda|N_u| ) = O(\lambda |\edv| )$ computation cost to build the table $T$. Thus, the computation complexity for graph node $u$ is $O(\lambda|\edv|  )$.

In the online phase, the server $\sdv$ evaluates $\gamma m$ OKVS encodings each of which stores $l_j$ key-value pairs with pair size $O(\lambda)$. So, its computation complexity is $O(\sum (1 + \varepsilon) l_j \lambda) = O((1 + \varepsilon) \lambda |\edv| )$.  $\sdv$ also invokes $\gamma m$ instances of $\fdv_\textsf{mpOPRF}$ and $\fdv_\textsf{EQ}$, making $O(\gamma \lambda m)$ computation complexity. Noting that $O(\gamma m) \leq O((1 + \varepsilon) |\edv|)$, the computation complexity of the server is $O((1 + \varepsilon) \lambda |\edv| )$.

The computation of the querier $Q$ mainly occurs in Step 13 and 15, decoding $\gamma m$ OKVS encodings with value size $\lambda$. This makes $O((1 + \varepsilon) \lambda \gamma m )$ computation complexity.
\end{proof}

Integrating the secure composition technique presented in Algorithm \ref{algorithm: secure composition} into the intuitive construction of \textsf{Sectric} in Algorithm \ref{algorithm: Sectric protocol} to replace Step 3-6, we obtain the full construction of \textsf{Sectric}. In the following, we analyze the privacy guarantee of this full construction of \textsf{Sectric}, which is demonstrated in Theorem \ref{theorem: sectric privacy}.

\begin{theorem}
	\textsf{Sectric} securely fulfills the task of privacy-preserving local triangle counting.
	\label{theorem: sectric privacy}
\end{theorem}
\begin{proof}
	From the description of the protocol, \chreplaced{the views of the server and the querier in \textsf{Sectric} are the same}{ the view of the server and the querier in \textsf{Sectric} is same} as the joint view in multiple invocations of the $\Pi_\textsf{3PPSMT}$ protocol. So, by Theorem \ref{theorem: 3PPSMT protocol privacy} and Theorem \ref{theorem: reusability of 3PPSMT preprocessing}, only $|N_u|$ for each $u$ is revealed to $\sdv$ and $\sum_{u \in \vdv} |N_u|$ is revealed to $Q$. As $|N_u| = D$ for all $u \in \vdv$ in the invocations, this does not reveal the privacy of protocol participants.
\end{proof}

As the main body of \textsf{Sectric} is to invoke multiple instances of the $\Pi_\textsf{3PPSMT}$ protocol using the secure composition technique, we do not analyze its overheads further for brevity.

\section{Implementation and Evaluation}
\label{section: implementation and evaluation}

In this section, we perform extensive experiments on the performance of \textsf{Sectric}. As far as we know, \textsf{Sectric} is the first crypto-assisted solution specifically designed for the privacy-preserving local triangle counting problem. In order to set a proper baseline for our experiments, we make modifications to the open-source implementation of CARGO~\cite{liu2024cargo} to adapt it to this problem. CARGO is the state-of-the-art in the context of crypto-assisted graph data analysis, which is designed for privacy-preserving global triangle counting.

\subsection{Experimental Setting}

\stitle{Dataset.}
The experiments are conducted on real-world graphs of different scales collected from SNAP~\cite{jure2014snap} and Network Repository~\cite{networkrepository}. Details of the real-world graphs are provided in Table \ref{table:datasets}, where {$|\vdv|$} represents the number of nodes, {$|\edv|$} denotes the number of edges, $d_{\max}$ indicates the maximum degree and \emph{Domain} denotes the types of graphs. We also conduct experiments on synthetic graphs generated using the Problem Based Benchmark Suite (PBBS)~\cite{shun2012brief} under different parameters.

\stitle{Implementation Details.}
We implement our experimental evaluations using C++ programming. All our experiments are conducted on an Ubuntu virtual machine configured with a 2.6 GHz Intel i9-13900H, equipped with 4 cores and 32 GB of RAM. Unless otherwise specified, we record the computation time at a network speed of 10 Gbps and measure the online communication overhead as the amount of data transferred between the server and the querier. Our implementation targets an error probability of $2^{-40}$ and 128 bits of computational security, assuming that the server and the querier have pre-generated the Beaver Triples in advance. We integrate the OKVS and OPRF~\cite{raghuraman2022blazing} implementations from~\cite{Kunlun}, as well as the private equality test implementation from~\cite{EQ}.

\stitle{Modifications on CARGO.}
CARGO~\cite{liu2024cargo} is designed for the global triangle counting problem, which returns the number of triangles in the whole graph. CARGO introduces two non-collusive servers to fulfill this task. Given a graph of $|\vdv|$ nodes, two servers secret-share the adjacency table $A$. The adjacency table $A$ is of size $|\vdv| \times |\vdv|$, where $A[i, j] = 1$ if the graph node with index $i$ and $j$ are adjacent, and $A[i, j] = 0$ otherwise.

In the original version of CARGO, the two servers obtain the number of global triangles by securely calculating $$\sum_{1 \leq i < j < k \leq |\vdv|}A[i, j] \cdot A[j, k] \cdot A[k, i]$$ based on the secret shares. In our experiments, we modify the two servers to securely calculate $$\sum_{1 \leq i < j\leq |\vdv|}A[Q, i] \cdot A[Q, j] \cdot A[i, j]$$ supposing graph node $Q$ queries the number of its local triangles.

\subsection{Experiment Results}
\stitle{Online Phase's Overheads on Real-world Graphs.}
We first evaluate \textsf{Sectric} and CARGO~\cite{CARGO} on five real-world graphs of different scales: Facebook, CondMat, roadNet, email-Enron and loc-Brightkite. The online phase's overheads of the two protocols are shown in the Table \ref{table:real_world_combined}. It is important to note that \textsf{Sectric} and CARGO both have a preprocessing phase, but CARGO does not provide an open-source implementation of their preprocessing phase. Therefore, the experiment results only compare the online phase's overheads here. We separately report the experiment results on the overheads in the preprocessing phase of \textsf{Sectric} in the following part.

The experiment results show that CARGO is suitable for graphs with fewer nodes, whereas \textsf{Sectric} is more suitable for larger-scale graphs.
On the loc-Brightkite graph, our protocol reduces online computation overheads by 44.02\% and communication overhead by 72.17\%.

\stitle{Online Phase's Overheads on Synthetic Graphs.}
To further assess the relationship between the computational and communication overheads of CARGO and \textsf{Sectric} with different graph parameters, we also conduct experiments on synthetic graphs generated from PBBS~\cite{shun2012brief}. The graphs are synthesized with different maximum node degree $d_{max}$ and node size $|\vdv|$. We evaluate the computation and communication overheads between the server and querier of the two protocols on these synthetic graphs.

From the experiment results in Fig. \ref{fig:combined computation} and \ref{fig:combined communication}, we can find that \textsf{Sectric}'s computation and communication overheads grow linearly with $d_{max}$, while the overheads of CARGO depend solely on the number of nodes in the graph. \textsf{Sectric} outperforms CARGO on graphs with larger node size. On the largest graph in the experiments, \textsf{Sectric} reduces and computation overheads by 86.52\% and communication overheads by 90.76\%.

Additionally, we study the effect of the graph's node size $|\vdv|$ on the growth trends of the overheads, and the results are presented in Fig. \ref{fig:node_gplus}. The results show that the overheads of \textsf{Sectric} grow linearly with $|\vdv|$, while those of CARGO grow quadratically with $|\vdv|$. The results align with our theoretical analysis.

\stitle{Density Effects.} To explore the impact of graph density on Sectric performance, we selected five synthetic graphs, each with 8192 nodes and a maximum degree limited to 1002. The number of edges varies, which in turn changes the density. The density of the graph $G$ is calculated using the formula:
\[
\emph{$density_G$} = \frac{2 \times |\mathcal{E}|}{|\mathcal{V}| \times (|\mathcal{V}| - 1)}
\]
The communication volume between the querier and the server during the online phase is constant at 1.71GB, while the time comparison is shown in Fig. \ref{fig:sparsity overheads on synthetic graph}. We observe that, with the number of nodes and maximum degree held constant, density itself does not affect the experimental performance. However, in real-world scenarios, sparser graphs are more likely to have lower maximum degrees. Therefore, we prefer to use the number of nodes and maximum degree as key influencing factors.

\stitle{Multi-thread Optimization.}
The computation overheads of \textsf{Sectric} can be reduced through multi-thread optimization. We also implement a multi-thread version of \textsf{Sectric} and conduct an experiment on its performance. Since the communication overhead is not affected by this optimization, we only present the experiment results on the computation overheads in Table \ref{table:online_benchmark}. The results demonstrate that with an 8-thread optimization, the computation overheads can be reduced by 10\% - 72\% on real-world graphs.

\stitle{Experiment Results on Preprocessing Phase.} 
In the following, we present the experiment results on the preprocessing phase of \textsf{Sectric}, including the computation and communication overheads. The experiment results on real-world graphs are shown in Table \ref{table:offline_benchmark}, and these on synthetic graphs are presented in Fig. \ref{fig:preprocessing overheads on synthetic graph}. As the preprocessing phase only needs to be invoked once, the overheads are acceptable.

\stitle{Summary.}
The above experiments demonstrate that, as the overheads of \textsf{Sectric} grow linearly with respect to $|\vdv|$ and $d_{max}$, while those of CARGO grow quadratically with respect to $|\vdv|$, \textsf{Sectric} outperforms CARGO on larger graphs. From the results in Fig. \ref{fig:node_gplus}, when the graph has more than 10,000 nodes, \textsf{Sectric} has better performance. The experimental results on real-world graphs also demonstrate that \textsf{Sectric} incurs less overheads on the three larger graphs: CondMat, roadNet and loc-Brightkite. Therefore, \textsf{Sectric} becomes a more suitable choice for the analysis of large graphs.

\section{Conclusion}
\label{section: conclusion}

In this work, we present \textsf{Sectric}, a novel \underline{se}rver-aided \underline{c}rypto-assisted local \underline{tri}angle \underline{c}ounting protocol. \textsf{Sectric} achieves both high result accuracy and cryptographic-level privacy guarantees utilizing cryptographic primitives. It explores a new PSI Cardinality-based approach to local triangle counting with high efficiency. To avoid intermediate privacy cost, we also define and implement a new cryptographic primitive. We demonstrate the security and performance of our proposed protocol through analysis and empirical evaluation.

Texte modifié

Ouvrir un fichier

\begin{abstract}
Graph data analysis, particularly local triangle counting, plays a pivotal role in deciphering complex relationships within graph data. This method is invaluable across diverse fields such as social networks, transportation, and cybersecurity. However, this process often involves handling sensitive information, necessitating that the relationship between any two nodes be considered private. 
Differential privacy (DP) is a formal model to address this privacy concern and can be categorized into two types: the central DP (CDP) model, which achieves better result accuracy, and the local DP (LDP) model, which does not assume a trusted server. 
To bridge the gap between the two models, we propose {\textsf{Sectric}}, a \underline{se}rver-aided \underline{c}rypto-assisted \chadded{local} \underline{tri}angle \underline{c}ounting protocol, in this paper. 
It can achieve the same result accuracy with the same privacy budget as the CDP model without assuming a trusted server.
\textsf{Sectric} also explores a new approach in crypto-assisted graph data analysis algorithms that represents a node's neighbors using a set instead of \chreplaced{an}{a} adjacency vector, and successfully \chreplaced{achieves}{achieve} higher efficiency compared to other crypto-assisted solutions.
We also conduct theoretical and empirical evaluations to demonstrate that \textsf{Sectric} achieves the design principles.
\end{abstract}

Privacy is another concern in the decentralized setting. Privacy-sensitive users \chreplaced{would not want}{will not hope} to reveal the identities of their neighbors to third parties. Differential privacy (DP) is a formal model addressing this privacy concern. 
Prior differentially private solutions can be mainly categorized into two types: adapting the central DP (CDP) model or the local DP model (LDP). 
The CDP model assumes a trusted server to calculate the triangle counts and adds a small noise to the final result\footnote{In the CDP model, the server knows the complete and accurate topology of the graph, including all nodes and the connection edges to their neighbors.}. The LDP model eliminates the trust assumption on the server, but has to add more noise in the calculating process with the same privacy budget. Thus, the CDP model has better result accuracy, and the LDP model aligns better with the decentralized setting.

Meanwhile, \textsf{Sectric} enables the querier to calculate the accurate counting result. It guarantees utility but may also reveal graph nodes' \chreplaced{adjacency}{adjacent} relationship. To prevent this privacy leakage in the \chreplaced{computation}{calculating} result, we also demonstrate that \textsf{Sectric} is compatible with the DP mechanism. \textsf{Sectric} allows adding noise subject to a given distribution, ensuring that the querier receives only the noisy result, while the server gains no information. The noise intensity is \chreplaced{the same as}{same to} that in the CDP model given the same privacy budget. In a nutshell, \textsf{Sectric} has the same result utility when providing the same privacy guarantee as the CDP model, and meanwhile requires no trusted server as the LDP model.

Furthermore, we also fully utilize the local view of graph nodes to reduce the number of required servers. Prior crypto-assisted graph data analysis solutions require two or more non-collusive servers to assist, while \textsf{Sectric} only requires one assisting server. This requirement is easier to implement in practical applications.

The main contributions of this work can be summarized as:

\stitle{Paper Organization.}
The rest of this paper is organized as follows: In Section \ref{section: related work}, we discuss the related works. Then, we introduce the necessary preliminaries of this work in Section \ref{section: preliminaries}. In Section \ref{section: 3PPSMT}, we define the primitive $\fdv_\textsf{3PPSMT}$ and propose the $\Pi_\textsf{3PPSMT}$ protocol implementing this primitive. Based on this primitive, the construction of the \textsf{Sectric} protocol is presented in Section \ref{section: sectric construction}. Section \ref{section: implementation and evaluation} presents the experimental results, and Section \ref{section: conclusion} concludes this paper.

Thus, many works have proposed a decentralized model, where the node set is still considered public knowledge, but the relationship between two nodes is only known to them and treated as their privacy. Sun et al.~\cite{Sun2019AnalyzingSS} \chreplaced{propose}{proposed} a local differential privacy approach, where users locally perturb their adjacency vectors to protect the privacy of edges. However, their assumption that users have an extended local view, allowing them to see their 2-hop neighbors, introduces the data correlation problem~\cite{Liu2022CollectingTC}, and is not applicable in most real-world cases. In the more realistic scenario where users can only see their immediate neighbors, Imola et al.~\cite{Imola2021CommunicationEfficientTC, Imola2020LocallyDP} utilize multiple rounds of interactions to upload and download perturbed edge information. While this approach can preserve privacy, it also introduces a non-negligible additive error, as highlighted by~\cite{Eden2023TriangleCW}.

In the decentralized model, crypto-assisted solutions to triangle counting are also emerging in recent years. CARGO~\cite{liu2024cargo} utilizes a hybrid approach that combines additive secret sharing and differential privacy, allowing two untrusted servers to only see encoded values beyond other information. This approach enables users to add smaller amounts of noise when implementing differential privacy, thereby achieving better utility compared to ~\cite{Imola2021CommunicationEfficientTC}. Building on a similar approach, Imola et al.~\cite{Imola2022DifferentiallyPT} introduce a trusted intermediate server with shuffling functionality. Another work, MAGO~\cite{wang2023mago}, which is based on lightweight secret sharing techniques, utilizes three servers from different trust domains working in coordination to improve the accuracy of triangle counting and also detect whether malicious adversaries attempt to tamper with the statistical result.

A summary of the comparison between \textsf{Sectric} and other related works on triangle counting is presented in Table \ref{tab: related works}.

\subsection{Crypto-Assisted Graph Analytics}

Crypto-assisted solutions are also emerging in other graph analytics tasks.

\section{Preliminaries}
\label{section: preliminaries}

We first introduce some notations used in this paper. For a positive integer $N$, $[N]$ denotes the set $\setpresentation{1, 2, \dots, N}$\chdeleted{ for a given positive integer N}, and $\mathbb{Z}_N$ represents the group modulo $N$. Given a set $\xdv$, $x \xleftarrow{\$} \xdv$ indicates that $x$ is uniformly selected from $\xdv$. A real number ensemble $\setpresentation{p_\lambda}_{\lambda \in \mathbb{N}}$ is negligible if, for any polynomial $p$, $p_\lambda \leq \frac{1}{p(\lambda)}$ for all sufficiently large $\lambda$.

\subsection{Problem Definition}
\label{subsection: problem definition}

We also define the notion of a graph node's neighbor set in Definition \ref{definition: neighbor set}. The neighbor set $N_u$ of a node $u$ denotes the set of all nodes adjacent to $u$.

The following theorem establishes the relationship between the local triangle sets and the neighbor sets, which serves as the foundation for our \textsf{Sectric} protocol.

\begin{proof}\color{blue}
	We know that the number of triangles containing node $u$ is given by:

$$
|\Delta_u| = \sum_{v \in N_u} \sum_{w \in N_u, w > v} I(v, w),
$$

where $I(v, w)$ is an indicator function that equals 1 if there is an edge between $v$ and $w$, and 0 otherwise.

Additionally, we have:

$$
|N_u \cap N_v| = \sum_{w \in N_u} I(v, w).
$$

Thus, we can express the sum as:

$$
\sum_{v \in N_u} |N_u \cap N_v| = \sum_{v \in N_u} \sum_{w \in N_u} I(v, w).
$$

In the calculation process, the positions of $v$ and $w$ are equivalent and interchangeable. Therefore, we can conclude that:

$$
|\Delta_u| = \sum_{v \in N_u} \sum_{w \in N_u, w > v} I(v, w) = \frac{1}{2} \sum_{v \in N_u} |N_u \cap N_v|.
$$
\end{proof}

\begin{definition}[Privacy Constraint]
	\label{definition: sectric privacy}
	\textsf{Sectric} is a privacy-preserving protocol if there exist polynomial-time algorithms $\adv_{\sdv}$, $\adv_Q$, and $\adv_{\vdv'}$ such that the view of $\sdv$, the view of the querier $Q$, and the joint view of graph nodes $\vdv' \subset \vdv$ in the execution of \textsf{Sectric} can be simulated by $\adv_{\sdv}(1^\lambda, 1^{|\vdv|})$, $\adv_Q(1^\lambda, 1^{|\vdv|})$, and $\adv_{\vdv'}(1^\lambda, 1^{|\vdv|})$, respectively, where $\lambda$ is the security parameter. Here, ``view being simulated by an algorithm'' means that the view and the algorithm's output are computationally indistinguishable.
\end{definition}

\subsection{Cryptographic Tools}

In the following, we introduce the cryptographic tools used in this work.

\stitle{Multi-point OPRF (mpOPRF).}
A pseudorandom function (PRF) family is a cryptographic tool that emulates a random function family. This notion is formalized in Definition \ref{definition: pseudorandom function}. Given a uniform seed $s$, a pseudorandom function is computationally indistinguishable from a truly random function. It is usually implemented with symmetric encryption algorithms, such as AES, in practice.

\begin{definition}[PRF family]
	A function family $\setpresentation{f_s}_{s \in \bin^*}$, where $f_s$ is a map $\bin^{|s|} \rightarrow \bin^{|s|}$ for all $s \in \bin^*$, is a PRF family if it satisfies the following conditions:
	\begin{enumerate}
		\item There exists a polynomial-time algorithm $F$ such that $F(s, x) = f_s(x)$ for all $\lambda \in \mathbb{N}$ and all  $s, x \in \bin^\lambda$.
		\item For any probabilistic polynomial-time algorithm $\adv$, $$|Pr[\adv^{f_s(\cdot)}(1^\lambda) | s \xleftarrow{\$} \bin^{\lambda}] - Pr[\adv^{f(\cdot)}(1^\lambda) | f \xleftarrow{\$} \fdv_{\lambda}]|$$
		      is negligible, where $\adv^{f(\cdot)}$ denotes that $\adv$ can access oracle $f$ and $f \xleftarrow{\$} \fdv_{\lambda}$ denotes that $f$ is a truly random function mapping $\bin^\lambda \rightarrow \bin^\lambda$.
	\end{enumerate}
	\label{definition: pseudorandom function}
\end{definition}

\stitle{Oblivious Key-Value Store (OKVS).}
The notion of oblivious key-value store (OKVS) was first introduced by Garimella et al.~\cite{garimella2021oblivious} in the context of private set intersection (PSI). OKVS allows one to encode a set of key-value pairs into an encoding and ensures that the original key-value pairs generating the encoding cannot be recovered from the encoding given that the encoded values are uniformly random.

\section{Three-Party Private Set Membership Test}
\label{section: 3PPSMT}

\subsection{Construction}
\label{subsection: 3PPSMT construction}

The $\Pi_\textsf{3PPSMT}$ protocol is presented in Algorithm \ref{algorithm: 3PPSMT protocol}. In this protocol, the querier $Q$ inputs an element $u \in \bin^\lambda$ and the set provider $U$ inputs a set $X$. This protocol aims to test whether $u \in X$ and secret-shares the output between the querier $Q$ and a server $\sdv$.

In the online phase, $\sdv$ and $U$ first send $S_\sdv$ and $S_U$ to $Q$, respectively, in Step 4.
After receiving $S_U$ and $S_\sdv$, $Q$ decodes them to obtain $y = \Pi.D(S_U, u)$ and $\hat{y} = \Pi.D(S_\sdv, y)$ in Step 5.
In Step 6 and Step 7, $\sdv$ and $Q$ invoke $\fdv_\textsf{mpOPRF}$, and $Q$ has $y' = f_{k_\sdv}(y)$. $Q$ then evaluates $r' = \hat{y} - y'$.
Finally, $\sdv$ and $Q$ invoke $\fdv_\textsf{EQ}$ to test whether $r = r'$.

In contrast, supposing that $u \notin X$, the decoding result $\hat{y}$ will not equal $f_{k_\sdv}(y_{i^*}) + r$ (with overwhelming probability). Thus, the resulting $r'$ is not equal to $r$, and $\sdv$ and $Q$ will secret-share the result that $r' \neq r$ in the invocation of $\fdv_\textsf{EQ}$.

\subsection{Protocol Analysis}
We first analyze the communication and computation complexity of the $\Pi_\textsf{3PPSMT}$ protocol in Theorems \ref{theorem: 3ppsmt protocol communication complexity} and \ref{theorem: 3ppsmt protocol computation complexity}.

\begin{theorem}
	Supposing that the OKVS scheme $\Pi$ has an expansion ratio $1 + \varepsilon$, the communication complexity of the $\Pi_\textsf{3PPSMT}$ protocol is $O(\lambda |X|)$ in the preprocessing phase and $O((1 + \varepsilon)\lambda |X|)$ in the online phase.
	\label{theorem: 3ppsmt protocol communication complexity}
\end{theorem}

In the online phase, $U$ sends an OKVS encoding $S_U$ to $Q$, and $\sdv$ sends an OKVS encoding $S_\sdv$ to $Q$ in Step 4. Each OKVS encoding stores $|X|$ key-value pairs, where each value is in the space $\bin^\lambda$. Since the OKVS scheme $\Pi$ has an expansion ratio $1 + \varepsilon$, the encodings' sizes are both $(1 + \varepsilon) \lambda |X|$. Then, in the invocation of $\fdv_{\textsf{mpOPRF}}$ and $\fdv_\textsf{EQ}$, the communication complexity is $O(\lambda)$. Therefore, the communication complexity of the online phase is $O((1 + \varepsilon)\lambda |X|)$.
\end{proof}

\begin{theorem}
	Supposing that the OKVS scheme has an expansion ratio $1 + \varepsilon$, the computation complexity of the set provider $U$ is $O((1 + \varepsilon)\lambda |X|)$ in the preprocessing phase (the server and querier have no computation in the preprocessing phase), and $O((1 + \varepsilon) \lambda |X|)$ for both the server and the querier in the online phase.
	\label{theorem: 3ppsmt protocol computation complexity}
\end{theorem}
\begin{proof}
	The computation complexity in the preprocessing phase involves:
	\begin{itemize}
		\item The set provider $U$ evaluates $f_{k_U}(x_i)$ for all $x_i \in X$, which has $O(\lambda |X|)$ computation complexity.
		\item The OKVS encoding operation stores $|X|$ key-value pairs, which has $O((1 + \varepsilon)\lambda|X|)$ computation complexity.
	\end{itemize}
	Therefore, the preprocessing phase of the $\Pi_\textsf{3PPSMT}$ protocol has $O((1 + \varepsilon) \lambda |X|)$ computation complexity for the set provider.

The computation complexity in the online phase involves:
	\begin{itemize}
		\item The OKVS encoding operation stores $|X|$ key-value pairs in Step 3, which has $O((1 + \varepsilon)\lambda|X|)$ computation complexity for the server.
		\item The OKVS decoding operations have $O((1 + \varepsilon) \lambda |X|)$ computation complexity for the querier.
		\item The invocation of $\fdv_\textsf{mpOPRF}$ and $\fdv_\textsf{EQ}$ has $O(\lambda)$ computation complexity for both the server and the querier.
	\end{itemize}
	Therefore, the online phase of the $\Pi_\textsf{3PPSMT}$ protocol has $O((1 + \varepsilon) \lambda |X|)$ computation complexity for both the server and the querier.
\end{proof}

\begin{theorem}
	The $\Pi_\textsf{3PPSMT}$ protocol presented in Algorithm \ref{algorithm: 3PPSMT protocol} securely implements the $\fdv_\textsf{3PPSMT}$ functionality defined in Algorithm \ref{algorithm: 3PPSMT functionality} against semi-honest polynomial-time adversaries, given that $\sdv$, $U$, and $Q$ are non-collusive.
	\label{theorem: 3PPSMT protocol privacy}
\end{theorem}
\begin{proof}
	We prove this theorem by demonstrating that:
	\begin{enumerate}
		\item $\sdv$ cannot extract any additional knowledge on $Q$'s query $u$ and $U$'s set $X$ besides $b_\sdv$ and $|X|$ from its view.
		\item $Q$ cannot extract any additional knowledge on $U$'s set $X$ besides $b_Q$ and $|X|$ from its view.
		\item $U$ cannot extract any knowledge from its view.
	\end{enumerate}

In the following, we demonstrate the above statements:
	\begin{enumerate}
		\item $\sdv$ cannot extract any additional knowledge on $Q$'s query $u$ and $U$'s set $X$ besides $b_\sdv$ and $|X|$ from its view.
		      \begin{enumerate}
			      \item $Hybrid_1$: We modify $Hybrid_0$ by replacing $y_i$ in Step 2 with random numbers. $\sdv$'s view in $Hybrid_1$ is computationally indistinguishable from its view in $Hybrid_0$ due to the property of pseudorandom functions.
		      \end{enumerate}
		      $\sdv$'s view in $Hybrid_1$ consists of $|X|$ random values from $U$ in Step 2, its view from $\fdv_\textsf{mpOPRF}$ in Step 7, which has no output, and $b_\sdv$ from $\fdv_\textsf{EQ}$ in Step 8. This view can be easily simulated given $|X|$ and $b_\sdv$.

\item $Q$ cannot extract any additional knowledge on $U$'s set $X$ besides $b_Q$ and $|X|$ from its view.
		      \begin{enumerate}
			      \item $Hybrid_1$: We modify $Hybrid_0$ by replacing $y_i$ in Step 2 and $f_{k_\sdv}(y_i)$ in Step 3 with random numbers. $Q$'s view in $Hybrid_1$ is computationally indistinguishable from its view in $Hybrid_0$ due to the property of pseudorandom functions.
			      \item $Hybrid_2$: We modify $Hybrid_1$ by replacing $x_i$ in Step 2 and $y_i$ in Step 3 with random numbers. $Q$'s view in $Hybrid_2$ is computationally indistinguishable from its view in $Hybrid_1$ due to the obliviousness of the OKVS scheme by Definition \ref{definition: okvs obliviousness}.
			      \item $Hybrid_3$: We modify $Hybrid_2$ by replacing $y'$ in Step 7 with a random number. $Q$'s view in $Hybrid_3$ is computationally indistinguishable from its view in $Hybrid_2$ due to the property of pseudorandom functions.
		      \end{enumerate}
		      $Q$'s view in $Hybrid_3$ consists of two OKVS encodings in Step 4, which store $|X|$ key-value pairs with random keys and random values, a random number in Step 7, and $b_Q$ from $\fdv_\textsf{EQ}$ in Step 8. This view can be easily simulated given $|X|$ and $b_Q$.

\section{The \textsf{Sectric} Protocol}
\label{section: sectric construction}

\subsection{Intuitive Construction}
\label{subsection: sectric construction overview}

In this part, we first provide an intuitive construction of the \textsf{Sectric} protocol. We call this construction ``intuitive'' because our privacy constraint is to preserve the neighbor privacy of all graph nodes, but this intuitive construction only preserves all nodes' neighbor privacy except the querier $Q$'s. It reveals the identities of the querier's neighbors to the server.

We will explain the reason why this intuitive protocol provides neighbor privacy for other nodes and why it reveals the identities of the querier's neighbors after describing it.

\stitle{Protocol description.}
\textsf{Sectric} is executed on a decentralized graph. The node set is public, and the edge set is distributed among all graph nodes, where each graph node has its neighbor set. The protocol involves the server $\sdv$ and graph nodes $\vdv$ as the protocol participants. The server $\sdv$ has no input, and each graph node $u \in \vdv$ has its neighbor set $N_u$ as the input.

Suppose that a graph node $Q \in \vdv$ initiates a local triangle counting task and queries the number of its local triangles. To fulfill this task, $\sdv$ and $Q$ first aim to secret-share $|N_Q \cap N_u|$ for each $u \in N_Q$. To achieve this, they test for each $v \in N_Q$ whether $v \in N_u$ and secret-share the result. This task can be done through invoking the 3PPSMT functionality $\fdv_\textsf{3PPSMT}$.

Therefore, the intuitive protocol works as follows. Given a neighbor $u \in N_Q$, $\sdv$, $Q$, and $u$ invoke the 3PPSMT functionality $\fdv_\textsf{3PPSMT}$. They act as the server, the querier, and the set provider, respectively, for each $v \in N_Q$, where $Q$ and $u$ use $v$ and $N_u$ as the inputs. In each invocation, $\sdv$ and $Q$ obtain the outputs. This step tests whether $v$ is in $N_u$ for all $v \in N_Q$, and $\sdv$ and $Q$ share the result. Aggregating the secret-shares of the results, $Q$ and $\sdv$ secret-share $|N_Q \cap N_u|$ for this neighbor $u$. $\sdv$ and $Q$ repeat the above procedure for all $u \in N_Q$ and secret-share $|N_Q \cap N_u|$ for all $u \in N_Q$. Finally, $\sdv$ and $Q$ sum the secret-shares, respectively. By Theorem \ref{theorem: relation between neighbor and local triangle}, they obtain a secret share of $$ \frac{1}{2}\sum_{u \in N_Q} |N_Q \cap N_u| = |\Delta_Q|.$$ $\sdv$ sends its share to $Q$, and then $Q$ can recover the result $|\Delta_Q|$.

\stitle{Insecurity of the intuitive protocol.} The reason that this intuitive construction reveals the querier's neighbors lies in the fact that the server $\sdv$ can identify $Q$'s neighbors by observing which graph nodes it interacts with in the instances of $\Pi_\textsf{3PPSMT}$.

In Section \ref{subsection: secure composition}, we provide a technique to securely compose multiple instances of the 3PPSMT protocol to fix the above insecurity.

\subsection{Full Protocol from Secure Composition}
\label{subsection: secure composition}

In the following, we describe the full construction of the protocol. This construction securely fulfills the privacy-preserving local triangle counting task.

Recall that multiple instances of the $\Pi_\textsf{3PPSMT}$ protocol are invoked in Steps 2-6 in the intuitive construction in Algorithm \ref{algorithm: Sectric protocol}. In each instance, the querier $Q$ specifies a node $u \in \vdv$ and an element $v \in N_Q$, and tests whether $v \in N_u$.

We model the process as a group of set providers (i.e., the graph nodes $\vdv$), each of which provides a set (i.e., $u \in \vdv$ provides $N_u$). The querier has $m$ queries, each of which specifies a set provider (i.e., $u \in N_Q$) and an element (i.e., $v \in N_Q / \setpresentation{u}$). Assisted by the server, the querier tests for each query whether the element is in the corresponding provider's set. The privacy requirement is that the set provider in each query is not revealed.

To fix this problem, we have a key observation that the querier $Q$ knows the queried set providers $\setpresentation{v_i}_{i \in [m]}$. Thus, $Q$ can ask $v_i$ for $S_{v_i} \leftarrow \Pi.E(\vectorpresentation{(x', f_{k_u}(x'))}_{x' \in N_{v_i}})$ as in Step 4 in Algorithm \ref{algorithm: 3PPSMT protocol}. After obtaining these OKVS encodings, $Q$ decodes $S_{v_i}$ on the queried element $x_i$ for each query $(v_i, x_i)$. The decoding result $\Pi.D(S_{v_i}, x_i) = f_{k_{v_i}}(x_i)$ if $x_i \in N_{v_i}$, and otherwise it is a pseudorandom number.

We address this challenge by observing that, due to the pseudorandomness of PRF values, the PRF values from different set providers make a collision with only negligible probability. Thus, the querier only has to test whether each decoding result is in all the PRF values possessed by the server. This task can be directly fulfilled using the technique from the $\Pi_\textsf{3PPSMT}$ protocol's online phase in Algorithm \ref{algorithm: 3PPSMT protocol}.

\stitle{Further optimization.} The weakness of the above solution is the large overheads. In the preprocessing phase, the server $\sdv$ totally receives $O(\sum_{u \in \vdv} |N_u|) = O(|\edv|)$ PRF values. The proofs of Theorems \ref{theorem: 3ppsmt protocol communication complexity} and \ref{theorem: 3ppsmt protocol computation complexity} \chreplaced{indicate}{indicates} that the communication overhead is $O((1 + \varepsilon)\lambda |\edv|)$ for each query in the above solution's online phase, so the communication overhead is totally $O((1 + \varepsilon)m \lambda |\edv|)$ in the online phase. Also, the total computation overhead is $O((1 + \varepsilon) m \lambda |\edv|)$ for both the server and the querier.

Through this method, we reduce the online phase's communication overhead from $((1 + \varepsilon)m \lambda |\edv|)$ to $O((1 + \varepsilon) \lambda |\edv|)$, and reduce the computation overheads from $O((1 + \varepsilon) m \lambda |\edv|)$ to $O((1 + \varepsilon) \lambda |\edv| )$ and $O((1 + \varepsilon)\lambda\gamma m)$ for the server and the querier, respectively. More details can be found in Theorems \ref{theorem: secure composition communication} and \ref{theorem: secure composition computation}.

Combining the above techniques, the protocol implementing the secure composition of multiple $\Pi_\textsf{3PPSMT}$ instances is presented in Algorithm \ref{algorithm: secure composition}. Integrating this protocol into the intuitive construction of \textsf{Sectric} in Algorithm \ref{algorithm: Sectric protocol} to replace Steps 3-6, we obtain the full construction of \textsf{Sectric}.

\stitle{Differential Privacy.} In some scenarios, directly publishing local triangle counts may itself pose a risk. Although our Sectric protocol focuses on process-level security, it can be easily extended to a differential privacy version to ensure security in such scenarios. At the end of the protocol, both the query requester and the server hold a share of the result. The server only needs to compute the noise and add it to its share, and the result recovered by the query requester will naturally be noisy. The specific parameters for adding noise can be referred to in~\cite{liu2024cargo}. First, the server obtains the noisy degree as follows: $ d' = d + \text{Lap}(1/\epsilon_1) $, and then it computes the noise as $ \text{Lap}(d'/\epsilon_2) $ and adds it to the share, where $ \epsilon_1 $ and $ \epsilon_2 $ represent the privacy budgets \chadded{and Lap refers to the Laplace distribution function}.

\subsection{Theoretical Analysis}
\label{subsection: theoretical analysis of secure composition}

In the following, we analyze the proposed protocols from a theoretical perspective. We first demonstrate the communication and computation cost of invoking multiple instances of $\Pi_\textsf{3PPSMT}$ using the secure composition technique proposed in Algorithm \ref{algorithm: secure composition}.

\begin{theorem}
	Given that the OKVS scheme has an expansion ratio $1 + \varepsilon$, the communication complexity of the composed protocol in Algorithm \ref{algorithm: secure composition} is $O(\lambda |\edv|)$ in the preprocessing phase and $O((1 + \varepsilon) \lambda |\edv| )$ in the online phase.
	\label{theorem: secure composition communication}
\end{theorem}
\begin{proof}
	In the preprocessing phase, the communication mainly occurs in Step 3, where each $u \in \vdv$ sends all PRF values $y_{u, i}$ to $\sdv$. There are a total of $\sum_{u \in \vdv} |N_u| = |\edv|$ PRF values of size $O(\lambda)$. Thus, the communication complexity of the preprocessing phase is $O(\lambda |\edv|)$.

In the online phase, the communication mainly occurs in Step 12, where $\sdv$ sends all $S_{\sdv, j}$ to $Q$, and for each query $(v_i, x_i)$, $v_i$ sends $S_{v_i}$ to $Q$. In the former, $S_{\sdv, j}$ stores $l_j$ key-value pairs with value size $O(\lambda)$. So, $S_{\sdv, j}$ has a size $(1 + \varepsilon) \lambda l_j$. Noting that all entries of $T$ store $O(\sum_{u \in \vdv} |N_u|) = O(|\edv|)$ PRF values, these OKVS encodings \chreplaced{have}{has} a total size of $O(\sum (1 + \varepsilon) \lambda l_j ) = O((1 + \varepsilon) \lambda |\edv|)$. In the latter, each OKVS encoding $S_{v_i}$ stores $|N_{v_i}|$ key-value pairs with value size $O(\lambda)$. So, $S_{v_i}$ has a size $O((1 + \varepsilon)\lambda |N_{v_i}|)$. Noting that for repeated $v_i$, $Q$ only has to ask for $S_{v_i}$ once, it requires $O(\sum_{u \in \vdv} S_u) = O(\sum_{u \in \vdv} (1 + \varepsilon) \lambda |N_u| ) = O((1 + \varepsilon) \lambda |\edv| )$ communication complexity to send $S_{v_i}$ for all queries $(v_i, x_i)$. Therefore, the communication complexity in the online phase is $O((1 + \varepsilon) \lambda |\edv| )$.
\end{proof}

\begin{theorem}
	\label{theorem: secure composition computation}
	Assuming the OKVS scheme has an expansion ratio $1 + \varepsilon$, the computation complexity of the secure composition technique in Algorithm \ref{algorithm: secure composition} is $O((1 + \varepsilon) \lambda |N_u|)$ for graph node $u \in \vdv$ and $O(\lambda |\edv|)$ for the server $\sdv$ in the preprocessing phase. In the online phase, the computation complexity is $O((1 + \varepsilon) \lambda |\edv|)$ for the server $\sdv$ and $O((1 + \varepsilon)\lambda\gamma m)$ for the querier $Q$.
\end{theorem}
\begin{proof}
	In the preprocessing phase, each graph node $u \in \vdv$ evaluates all $f_{k_u}(x_i)$ for all $x_i \in N_u$, requiring $O(\lambda |N_u|)$ computation complexity, and stores $|N_u|$ key-value pairs with value size $O(\lambda)$ in the OKVS encoding $S_u$, requiring $O((1 + \varepsilon)\lambda |N_u|)$ computation cost. Thus, the computation complexity for graph node $u$ is $O((1 + \varepsilon)\lambda |N_u|)$.

The server $\sdv$ has $O(\sum_{u \in \vdv} \lambda|N_u| ) = O(\lambda |\edv| )$ computation cost to build the table $T$. Thus, the computation complexity for graph node $u$ is $O(\lambda|\edv|)$.

In the online phase, the server $\sdv$ evaluates $\gamma m$ OKVS encodings, each of which stores $l_j$ key-value pairs with pair size $O(\lambda)$. So, its computation complexity is $O(\sum (1 + \varepsilon) l_j \lambda) = O((1 + \varepsilon) \lambda |\edv| )$. $\sdv$ also invokes $\gamma m$ instances of $\fdv_\textsf{mpOPRF}$ and $\fdv_\textsf{EQ}$, making $O(\gamma \lambda m)$ computation complexity. Noting that $O(\gamma m) \leq O((1 + \varepsilon) |\edv|)$, the computation complexity of the server is $O((1 + \varepsilon) \lambda |\edv| )$.

The computation of the querier $Q$ mainly occurs in Steps 13 and 15, decoding $\gamma m$ OKVS encodings with value size $\lambda$. This results in $O((1 + \varepsilon) \lambda \gamma m )$ computation complexity.
\end{proof}

Integrating the secure composition technique presented in Algorithm \ref{algorithm: secure composition} into the intuitive construction of \textsf{Sectric} in Algorithm \ref{algorithm: Sectric protocol} to replace Steps 3-6, we obtain the full construction of \textsf{Sectric}. In the following, we analyze the privacy guarantee of this full construction of \textsf{Sectric}, which is demonstrated in Theorem \ref{theorem: sectric privacy}.

\begin{theorem}
	\textsf{Sectric} securely fulfills the task of privacy-preserving local triangle counting.
	\label{theorem: sectric privacy}
\end{theorem}
\begin{proof}
	From the description of the protocol, \chreplaced{the views of the server and the querier in \textsf{Sectric} are the same}{ the view of the server and the querier in \textsf{Sectric} is same} as the joint view in multiple invocations of the $\Pi_\textsf{3PPSMT}$ protocol. So, by Theorems \ref{theorem: 3PPSMT protocol privacy} and \ref{theorem: reusability of 3PPSMT preprocessing}, only $|N_u|$ for each $u$ is revealed to $\sdv$, and $\sum_{u \in \vdv} |N_u|$ is revealed to $Q$. As $|N_u| = D$ for all $u \in \vdv$ in the invocations, this does not reveal the privacy of protocol participants.
\end{proof}

As the main body of \textsf{Sectric} is to invoke multiple instances of the $\Pi_\textsf{3PPSMT}$ protocol using the secure composition technique, we do not analyze its overheads further for brevity.

\section{Implementation and Evaluation}
\label{section: implementation and evaluation}

\subsection{Experimental Setting}

\stitle{Dataset.}
The experiments are conducted on real-world graphs of different scales collected from SNAP~\cite{jure2014snap} and Network Repository~\cite{networkrepository}. Details of the real-world graphs are provided in Table \ref{table:datasets}, where {$|\vdv|$} represents the number of nodes, {$|\edv|$} denotes the number of edges, $d_{\max}$ indicates the maximum degree, and \emph{Domain} denotes the types of graphs. We also conduct experiments on synthetic graphs generated using the Problem Based Benchmark Suite (PBBS)~\cite{shun2012brief} under different parameters.

\subsection{Experiment Results}
\stitle{Online Phase's Overheads on Real-world Graphs.}
We first evaluate \textsf{Sectric} and CARGO~\cite{CARGO} on five real-world graphs of different scales: Facebook, CondMat, roadNet, email-Enron, and loc-Brightkite. The online phase's overheads of the two protocols are shown in Table \ref{table:real_world_combined}. It is important to note that \textsf{Sectric} and CARGO both have a preprocessing phase, but CARGO does not provide an open-source implementation of their preprocessing phase. Therefore, the experiment results only compare the online phase's overheads here. We separately report the experiment results on the overheads in the preprocessing phase of \textsf{Sectric} in the following part.

\stitle{Online Phase's Overheads on Synthetic Graphs.}
To further assess the relationship between the computational and communication overheads of CARGO and \textsf{Sectric} with different graph parameters, we also conduct experiments on synthetic graphs generated from PBBS~\cite{shun2012brief}. The graphs are synthesized with different maximum node degrees $d_{max}$ and node sizes $|\vdv|$. We evaluate the computation and communication overheads between the server and querier of the two protocols on these synthetic graphs.

From the experiment results in Figs. \ref{fig:combined computation} and \ref{fig:combined communication}, we can find that \textsf{Sectric}'s computation and communication overheads grow linearly with $d_{max}$, while the overheads of CARGO depend solely on the number of nodes in the graph. \textsf{Sectric} outperforms CARGO on graphs with larger node sizes. On the largest graph in the experiments, \textsf{Sectric} reduces computation overheads by 86.52\% and communication overheads by 90.76\%.

\stitle{Experiment Results on Preprocessing Phase.} 
In the following, we present the experiment results on the preprocessing phase of \textsf{Sectric}, including the computation and communication overheads. The experiment results on real-world graphs are shown in Table \ref{table:offline_benchmark}, and those on synthetic graphs are presented in Fig. \ref{fig:preprocessing overheads on synthetic graph}. As the preprocessing phase only needs to be invoked once, the overheads are acceptable.

\stitle{Summary.}
The above experiments demonstrate that, as the overheads of \textsf{Sectric} grow linearly with respect to $|\vdv|$ and $d_{max}$, while those of CARGO grow quadratically with respect to $|\vdv|$, \textsf{Sectric} outperforms CARGO on larger graphs. From the results in Fig. \ref{fig:node_gplus}, when the graph has more than 10,000 nodes, \textsf{Sectric} has better performance. The experimental results on real-world graphs also demonstrate that \textsf{Sectric} incurs fewer overheads on the three larger graphs: CondMat, roadNet, and loc-Brightkite. Therefore, \textsf{Sectric} becomes a more suitable choice for the analysis of large graphs.

\section{Conclusion}
\label{section: conclusion}

In this work, we present \textsf{Sectric}, a novel \underline{se}rver-aided \underline{c}rypto-assisted local \underline{tri}angle \underline{c}ounting protocol. \textsf{Sectric} achieves both high result accuracy and cryptographic-level privacy guarantees utilizing cryptographic primitives. It explores a new PSI Cardinality-based approach to local triangle counting with high efficiency. To avoid intermediate privacy costs, we also define and implement a new cryptographic primitive. We demonstrate the security and performance of our proposed protocol through analysis and empirical evaluation.