CCBY - IEEE 不是这份资料的版权持有者。请按照 https://creativecommons.org/licenses/by/4.0/ 上的说明获取完整的文章和 API 文档中的规定。
Introduction 简介
Cloud computing has been widely deployed and become indispensable for many companies and institutions, due to its salient features such as on-demand services, elasticity, flexibility, multi-tenancy and efficient access to their data and reduced the maintenance cost [1]. Various cloud-based services and applications are gradually enriched, including many governments, educational institutions, medical institutions, and large enterprise groups [2]. As one of the key services, data outsourcing [3], [4], [5], [6], [7], [8], [9], [10] solves the problem of storing large amounts of data at low cost for users with limited storage resources and has been widely used.
由于其按需服务、弹性、灵活性、多租户和高效访问数据以及降低维护成本等显著特点,云计算已被广泛部署并成为许多公司和机构不可或缺的一部分 [1] 。各种基于云的服务和应用程序正在逐步丰富,包括许多政府、教育机构、医疗机构和大型企业集团 [2] 。作为关键服务之一,数据外包 [3] , [4] , [5] , [6] , [7] , [8] , [9] , [10] 解决了用户存储资源有限的大量数据存储低成本的问题,并得到了广泛应用。
If data is directly stored in the cloud, users have the risk of revealing privacy. This is very critical especially for high-impact business data or medical records. For most the service providers don’t guarantee not to see or modify clients’ data. A cloud server also may be selfish in order to save its computation or transmission overhead [14], [16]. So it is essential to provide security and privacy guarantees such as confidentiality, integrity, freshness, authenticity, and verification in addition to user experience, conventional operability and management such as updating, retrieving, and supporting multi-user, for an elegant and practical outsourcing storage scheme.
In order to ensure data confidentiality without losing operability and management of the data, many SSE schemes [11], [17], [18], [19], [20], [21] are proposed at beginning stage. However, most of these SSE schemes assume that the cloud server is honest but curious [19], [20]. This assumption is not always valid. Because the cloud server may suffer from external attacks, internal configuration errors, software vulnerabilities, and internal personnel threats [11], [18]. The data may be tampered or the server may return a cached history or partial results.
To prevent malicious server from returning partial results or tampered results, integrity check should be supported. However, most of the schemes ignore this or other issues. For example, some existing SSE schemes do not guarantee the integrity of the search results returned by the server to the user [17], [18], [19], [20]. Although there were some schemes could ensure the integrity of data [6], [22] and some verifiable SSE schemes have also been extensively studied [11], [13], [14], [23], [24], [25], [26], unfortunately, these scenarios only support validation in static databases [11], [26]. [21] cannot prevent a malicious server from returning an empty result. [26] using cuckoo-hash solves the problem of the server returning null results. However, it does not support dynamic updates for data validation.
In addition, most of the currently proposed SSE schemes are two-party models between data owners and servers. [15] proposed a verifiable three-party model SSE scheme based on the Merkle Tree Index. The three-party model includes the data owner, users, and servers. However, one drawback of [15] is that the server needs to traverse the whole index when the keyword queried is not present. In this situation, the server will take some time to traverse the whole index, create the results for verification, then sends the results to the user. And the user will take some time to verify the results, but the user get an empty result. The user experience will be greatly affected in this scenario. There are many cases which can leads to this scenario such as querying a non-existent keyword in the index, spelling errors, hitting an adjacent key when typing, etc. And this drawback exists in most of the current SSE schemes. We will give a detailed analysis in later section.
Table 1 compares various existing verifiable SSE schemes with our proposed scheme. To the best of our knowledge, none of the existing verifiable SSE schemes can explicitly allow users to verify the absence of keywords in the three-party model efficiently. In summary, the performance and security functionalities are not fully exploited.
Challenges. Although there have been many SSE schemes, there are still some challenges needed to be fully explored.
First, when a user makes a search query, how does the user verify the integrity of the results returned by the server? In our scheme, the data owner uses B+-Tree to build the index and hashes all nodes to generate a root using several hash functions. The user verifies the integrity of the search results through this root. We leverage two methods to generate the root of B+-Tree. One is to hash all the nodes to generate the root of the B+-Tree. The other is to hash only the leaf nodes to generate the root of the B+-Tree. Since its leaf nodes contain all key-value pairs, the integrity of the search results can also be verified. Although there are many applications based on B+-Tree, they cannot utilized directly in such context providing excellent privacy and security guarantees such as integrity checking, verification, etc.
Second, how does a scheme prevent malicious servers from launching replay attacks and support three-party model? [13] proposes a timestamp-based scheme to solve this problem, but it can be only applied to the two-party model. We address this issue by combining the timestamp mechanism with the root of the B+-Tree and authorized users.
Finally, costs and user experience are common issues for most of the current schemes. In traditional verifiable SSE schemes, whenever a keyword queried by the user exist or not, the server always needs to traverse the whole secure index and create the corresponding authenticator. It is not efficient for both of the server and the users. New idea should be adopted to differentiate the two situations to boost the performance and user experience. The difficulty is how to make the new designed scheme compatible with previous scheme and also keep the scheme supporting dynamic updating. Besides, it is essential to save users’ costs. For some cloud service provider not only charges based on the size of the storage resource requested by the client but also charges based on the amount of calculation [27], [28], [29]. How can the server avoid unnecessary searches? Therefore, it is necessary to improve the search efficiency of SSE. At present, most SSE schemes do not take into account charges. The server’s query efficiency is the same whether the queried tokens are present or absent.
Here, we design a specific CBF that supports efficient dynamic updating and verification to solve the final challenge. Our methodology is straightforward but effective. When the keyword queried by the user does not exist, the server only needs to return a Bloom Filter authenticator to the user and avoids to traverse the whole index and create the corresponding authenticator. The performance can be greatly improved and is independent of the scale of the dataset when the searching keyword is missing.
In Conclusion. Our contributions can be summarized as follows:
We propose an Efficient, Secure, and Verifiable SSE scheme under the three-party model. It can guarantee the integrity of the search query results returned by the server to the user and the freshness of the result.
Our secure index is based on B+-Tree and a specific CBF which supports secure verification, dynamic updating, and multi-user query. To the best of our knowledge, it is the first B+-Tree based SSE scheme which enables efficient verification, integrity checking, and dynamic updating.
The new designed CBF with counting capabilities supports efficient update and verification. It can greatly save both the server overhead for searching and users’ cost for verification, especially when the keyword queried by the user does not exist.
We verify the feasibility and safety of the scheme through experiments. The results show that the CBF can achieve stable, more efficient query and verification when the queried keyword does not exist.
Organization. The rest of the paper is organized as follows: Section 2 presents the related work. We introduce the threat model and our design goal in Section 3 and give a system overview of the proposed scheme in Section 4. The detailed construction is presented in Section 5. Then, we provide the security analysis in Section 6 and performance analysis and evaluation in Section 7 respectively. Finally, we present our conclusion in Section 8.
Related Work
Encrypted Storage Outsourcing. [30], [31] allows the client with limited resources to outsource large amounts of data to cloud service companies at a low cost. Since the data were directly encrypted and stored in the cloud server, users cannot directly query on the encrypted data. If the user wants to update the database. Users are required to download the data locally. Then update the database and upload it to the server.
Searchable Encryption. In order to query the encrypted data of the cloud server, some relevant technologies were proposed to solve this problem [32], [33], [34], [35]. SSE is one of the most important schemes to solve this problem. SSE is a search query scheme based on the keyword index. It can perform operations such as querying on cloud-encrypted data. In the work [5], the authors provided a privacy-protection framework for outsourced media search. The work relies on multimedia hashing and symmetric encryption and tries to balance the strength of privacy enforcement, the quality of search, and the computation complexity. In general, we summarize the related work in the following.
Verifiable Searchable Symmetric Encryption. Some of the most recent work focus on forward security or backward security. Most of them consider honest but curious server which follow the defined protocol. Although these schemes can provide forward or backward security, the search results cannot be verified efficiently when the servers perform active attacks. Based on these, some schemes with verification functionality were proposed [12], [15], [36]. Relevant verifiable SSE encryption technology was proposed [12], [15], [36]. [21] proposed dynamic authenticators are used to checking the integrity of the results, but cannot prevent replay attacks by malicious servers. For the verifiable scheme proposed in [14], [24], [37], either the query efficiency is a little gloomy, or it cannot support the validation in the case of updating. The verifiable universal scheme proposed by [37] is to transform the symmetric searchable encryption scheme into a no-dictionary verifiable scheme. Users do not have to maintain a collection of all keywords, but it is a static approach. [13] solved it by using the message verification code, but it can’t detect that the server intentionally returned an empty result. [15] proposes a secure verification scheme, and it can prevent replay attacks from cloud servers. However, this solution does not solve the problem that the keyword of the user query does not occur.
Multi-User Searchable Encryption. Some work tried to enhance the functionality and performance for data outsourcing and cloud services. The original SSE schemes work in a two-party model. The natural extension of the two-party model of searchable encryption scheme, multi-user model search encryption schemes are constantly being proposed. [15], [20], [36] these scenarios are typical multi-user search encryption schemes. [20] is the first time to propose a multi-user model encryption scheme based on broadcast encryption. [15] leveraged Merkle Patricia Tree and Incremental Hash to build the proof index of supporting data updates. [36] is presented which utilizes a two-keyword index to reduce the searching time. However, [36] has a shortcoming. The index of the two keywords used in the paper is comprised of all the keywords and all combinations of two keywords from all the keywords. This will significantly increase the storage overhead and search overhead of the server. When the queried keyword does not exist, [15], [36] has the same efficiency as the keyword exists. This reduces the search efficiency when the keyword is absent. [38] proposed a non-interactive multi-user searchable encryption scheme that reduced the interactions between the data owner and users. However, the scheme did not support the update of data. [3] tries to utilize caches, encryption and compression to improve performance and reduce the size of data transfers. [39] studies the problem of secure skyline queries over encrypted data. These solutions are still not full-fledged such as performance, verification, etc.
Problem Statement
In this section, we will describe the attack model of SSE. Then, we formally present our design goals.
3.1 Threat Model
We assume that the data owner is credible and the users authorized by the data owner are also trusted. The cloud servers store the encrypted index and documents and perform SSE. We consider the cloud servers are untrusted and malicious which may perform active attacks. The server might deduce some sensitive information from the search results. Besides, when a client lunch a query, the servers may return partial search results to the user. Or the servers may initiate replay attacks. We try to mitigate such attacks in our scheme.
Definition 1 Replay Attack.
Replay attacks of SSE are attempted by a malicious server (or attacker) to return search results to a user from a historical version of the dataset.
Definition 2 Data Integrity Attack.
Data integrity attacks in SSE is that a malicious server (or attacker) deliberately return partial search results or empty result to the user.
In addition, we don’t consider the access pattern leakage and query pattern leakage which can be tackled by oblivious methodologies. Also we don’t consider the information leakage through side channel or timing channel.
3.2 Design Goal
In this paper, our goal is to design an Efficient, Secure, Verifiable SSE scheme on the three-party model. We try to achieve the following goals:
The user can verify the correctness of the received result from the server.
The user can detect whether the server is launching a replaying attack.
The overall performance can be reasonably improved. For example, when the server receives the user
s query, the server can save the overall cost through our specific algorithm and data structure.′ The proposed data structure and algorithm can support efficient dynamic updating.
Overview Of ESVSSE
In this section, we will briefly describe our scheme. The main symbols used in this paper are given in Table 2.
4.1 System Structure
Fig. 1 shows the overall structure of our scheme. It consists of data owners, cloud servers, and authorized users. The data owner stores the encrypted data in a cloud server. Authorized users can search the data of the cloud server. Data owners have administrative rights over data in the cloud.
4.2 System Model
We aim to develop an Efficient, Secure, Verifiable Symmetric Searchable Encryption scheme(ESVSSE). The data owner uploads the encrypted documents, authenticator and the security index using the B+-Tree to the cloud server. This scheme allows the user to verify the integrity and freshness of the search results. Our ESVSSE scheme is defined as follows: ESVSSE is a three-party model where data owners store secure indexes, authenticators, and encrypted documents on the cloud server. The data owner can authorize users to query the cloud servers. The authorization procedure is similar to [20]. The cloud server provide storage and search capabilities. Authorized users can initiate query and verification operations.
Definition 3. ESVSSE is a collection of seven polynomial-time algorithms.
Gen (
: is a probabilistic algorithm that takes as input a security parameter k and outputs private keys1k)→{K1,K2,K3,(ssk,spk)} , and a random signing key pair (ssk, spk) and it is executed by the data owner.K1,K2,K3 Init
}: is an algorithm executed by the data owner. It takes as input the secret keys(K1,K2,K3,ssk,D)→{I,π,πbf,C , the signing private key ssk, and the document set D, and outputs the secure indexK1,K2,K3 , encrypted documentsI , the authenticatorC and the authenticator of Counting Bloom Filterπ . Theπbf is an authenticator, which can provide proof when the keyword queried by the user does not exist. The data owner stores theπbf andI locally and meanwhile sendsπbf ,I ,C , andπ to the cloud server.πbf PreUpdate
: is an algorithm that takes as private keys(K1,K2,K3,ssk,f)→{τω,π,πbf} , and the signing private key ssk, and the file f to be updated, and outputs the update tokensK1,K2,K3 and the authenticatorτω . The data owner runs the algorithm and sendsπ ,τω , and authenticatorπbf to the cloud server.π Update
: is an algorithm that takes as secure index(I,τω)→{I′} and updated tokensI , and outputs the new secure indexτω , and is run by the cloud server.I′ Trapdoor
: is a deterministic algorithm run by the authenticated user to generate a trapdoor for a given keyword. It takes as input a secret key(K1,ω)→{τω} and a keywordK1 , and outputs a trapdoorω corresponding toτω .ω Search
: is a deterministic algorithm run by the server. It takes as input the secure index(I,τω,tq)→{(ρ,πtq,πc,Rω)or(πbf)} , the trapdoorI , the query timeτω , and outputs the path listtq of search results, the authenticatorρ , and encrypted documentsπtq,πc of search results. Finally, the server sendsRω , andρ,πtq,πc to the user, or output a proof authenticatorRω .πbf Validation
: is an algorithm run by the authenticated user that takes as input symmetric keys(K1,K2,K3,spk,Rω,ρ,πtq,πc,τω,πbf,)→{0,1} , the public key spk, the query resultK1,K2,K3 , the secure listRω , authenticatorρ ,πtq,πc , and the tokenπbf corresponding toτω , it outputs a bit b. It represents the user accepts the result of the query when b is equal to 1, and rejects the search results when b is equal to 0.ω View Source⎧⎩⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪πi,0=(νi,0,Sig(νi,0))upi<tpi,0≤upi+1νi,0=EncK3(rti,0||tpi,0)…πi,j=(νi,j,Sig(νi,j))tpi,j−1<tpi,j≤upi+1νi,j=EncK3(rti,j||tpi,j||νi,j−1)…πi,n=(νi,n,Sig(νi,n))tpi,n=upi+1νi,n=EncK3(rti,n||tpi,nνi,n−1)(1) \begin{equation*} \qquad\quad\left\lbrace \begin{array}{l}\pi _{i,0}=(\nu _{i,0},Sig_(\nu _{i,0})) \qquad \qquad up_{i}< tp_{i,0}\leq up_{i+1}\\ \nu _{i,0}=Enc_{K_{3}}(rt_{i,0}||tp_{i,0}) \\ \ldots \\ \pi _{i,j}=(\nu _{i,j},Sig_(\nu _{i,j})) \qquad \qquad tp_{i,j-1}< tp_{i,j}\leq up_{i+1}\\ \nu _{i,j}=Enc_{K_{3}}(rt_{i,j}||tp_{i,j}||\nu _{i,j-1}) \\ \ldots \\ \pi _{i,n}=(\nu _{i,n},Sig_(\nu _{i,n})) \qquad \qquad tp_{i,n}= up_{i+1}\\ \nu _{i,n}=Enc_{K_{3}}(rt_{i,n}||tp_{i,n} \nu _{i,n-1}) \\ \end{array} \right. \tag{1} \end{equation*}
{πnbf=(ψnbf,Sig(ψnbf))ψnbf=EncK3(CBF||tpi,n||ψn−1bf)(2) \begin{equation*} \left\lbrace \begin{array}{l}\pi ^{n}_{bf}=(\psi ^{n}_{bf},Sig_(\psi ^{n}_{bf}))\\ \psi ^{n}_{bf}=Enc_{K_{3}}(CBF||tp_{i,n} ||\psi ^{n-1}_{bf}) \qquad\qquad\qquad\qquad\qquad\end{array} \right. \tag{2} \end{equation*}
The Construction of ESVSSE
In this section, by starting with algorithms to build the secure index, and then we
5.1 Building Secure Index
We use the B+-Tree structure to build a secure index. The index is built from the collection of documents D and the inverted list
An example illustrates the Bloom Filter as shown in the Fig. 3. Suppose the data owner has three tokens, the server hashes it through the four hash functions to the Bloom Filter. Suppose the user gives a token to the query. The server checks if there is a
5.2 Query Procedure
When an authorized user needs to query the data of the cloud server. First, the user generates a token for the keyword. Then, the user sends the token to the cloud server. The cloud server performs the search algorithm. The specific algorithm is shown in Fig. 6. When the cloud server receives the token sent by the user,
we consider both conditions that the token exists or it is absent in the index. First, the server checks whether the token exists in the Counting Bloom Filter. The server only needs to hash the queried token to another Counting Bloom Filter with k hash functions. The server verifies whether the token queried by the user exists by testing whether all the corresponding bits are 1. There are two cases. (1). If not all the testing bits are 1, the token is proved to be absent. In this case, the server only needs to return5.3 Verifying the Search Results
For the verification of data integrity, we use a hash function to hash each node of the B+-Tree from the leaf node to the root node layer by layer. The user can verify the integrity of the query results according to the root node. Another way is that the data owner only hashes all the leaf nodes. This paper uses a timestamp mechanism to verify the freshness of the data. The timestamp mechanism is shown in Fig. 7. It includes the update interval and the authenticator
In order to prevent the server from returning partial or historical versions of the query results. The authorized user performs the verification algorithm (shown in Fig. 8) when the user receives the query result from the cloud server. Similarly, there are two cases in which the results are verified. One is that the keyword does not exist and the other is that the keyword exists.
When the queried keyword does not exist, the user can use the
When the searched keyword exists, the user needs to build the root of the B+-Tree based on the results returned by the server and use the root to verify the integrity of search results. The freshness of the results is verified based on the timestamp in the decryption authenticator. The user can build root based on the list returned by the server and verify the integrity of the returned result by comparing it with the root of the authenticator.
Let us explain what is a replay attack by the following example. Let’s consider the following situations: 1. When the user queries at
5.4 Update
The Update algorithm updates the secure index on the server and supports three operations, i.e., the add, delete, and edit operations. The data owner first executes the pre-update algorithm to create the tokens for the update algorithm, CBF, the signature and sends them to the cloud server. Then, the cloud server executes the update algorithm to update the CBF and the secure index
5.5 An Illustrative Example
We illustrate our scheme through an example. Suppose there are nine files to initialize, i.e.
Now suppose that the user wants to query the keyword
Security Analysis
In this section, we will conduct a security analysis for our scheme. On the one hand, we should analyze the security and confidentiality of data on the cloud servers. On the other hand, we also make sure that users authorized by the data owner can verify the search results correctly. Confidentiality means that an attacker cannot learn any useful information from encrypted documents directly. Since the secure index is established by the token. Therefore, the cloud server cannot learn any useful information about the keyword. The data owner encrypts all the documents and stores them to the cloud server. So the server cannot learn any useful information from the encrypted documents.
Verifiability is the ability for the users to verify the integrity and freshness of search results. Users can verify that the result is tampered or attacked. The proof of general safety is demonstrated using Real/Ideal simulation. Definition (confidentiality) Let ESVSSE=(Gen, Init, Preupdate, Update, Trapdoor, Search, Validation) is an SSE scheme based on our CBF and B+-tree index, k is a security parameter. Suppose
Proof.
We will prove that there is a polynomial time simulator
The verifiability of a symmetric searchable encryption scheme is that the user can verify the integrity and freshness of the search results returned by the server. Integrity verification is to prevent the server from returning partial or incorrect search results. The verification of freshness is achieved through a timestamp mechanism to prevent the server from abandoning the replay attack, which prevents the server from returning historical search results to the user. Therefore, we use a game-based security definition to prove the verifiability of symmetric searchable encryption in ESVSSE.
Definition (verifiability): Assume that the ESVSSE scheme is a secure, dynamic, verifiable scheme based on symmetric searchable encryption. Consider the following probability experiment, and assume that
![Right-click on figure or equation for MathML and additional features. Right-click on figure for MathML and additional features.](/assets/img/icon.support.gif)
Besides, the query pattern and search pattern are leaked. This is common in many current schemes. And we can leverage oblivious ROM to alleviate such leakages.
Performance Analysis and Evaluation
In this section, we give a detailed performance analysis and evaluation of our scheme.
7.1 Performance Analysis
For checking existent is independent of the searching algorithm. The average performance of our scheme depends on the missing rate of the keywords. We define
Whereas, the performance of the most current scheme with the same functionality is about
In the real world, there are many cases that give rise to the increasing of the missing rate such as spelling errors, hitting an adjacent key when typing, etc. It also depends on how the index is built. For all these aspects vary across different people and different datasets, we will give a general analysis. People mainly operate computers through a keyboard interface. However, they often make spelling errors when using a keyboard, for example, hitting an adjacent key or using phonologically similar letters [40]. In [40], they define spelling errors as (1) a typo from an incorrect keyboard operation, (2) a spelling confusion from an incorrect identification. Their experiments show that the corrected errors, substitution errors were the most frequent, deletion and insertion errors were almost the same, and transposition errors were few. [41] studied the type of English spelling for students whose native language is English. Spelling errors of students were analyzed by grade-level spelling patterns and linguistic characteristics: phonological, orthographic, orthographic image, transposition, and morphological. Their results revealed that morphological spelling pattern errors occurred most frequently. Of the opportunities available for students to make linguistic spelling errors, phonological errors occurred 1.30 percent, orthographic errors occurred 8.52 percent, orthographic image errors occurred 8.23 percent, and morphological errors occurred 56.2 percent. The dataset also affect the built index as well as the way people build the index. For example, in the typical example of the Enron mailbox [42], if all the IDs are built in the index, it is hard for people to distinguish them.
7.2 Experiment Setup
In order to prove the feasibility of our proposed scheme, we have implemented it by using Java with the JDK 8.0. The prototype is written by about 4000+ lines of codes. The operating system we used is a 64-bit window7 system and the computer is with an i5-6500 CPU and 16 GB RAM. We use Python language to extract keywords. We use 128-bit AES to encrypt the authenticator and signs it with RSA signature. We implement two random-oracles with HMAC-SHA256 and SHA3-256. We use the Enron mailbox dataset [42]. The following sequences introduce our various aspects of experimental and testing results.
7.3 Index Construction Cost
First, we measure the delays of the initialization algorithm performed by the data owner as shown in Fig. 12. The Init algorithm includes the construction of the secure index, CBF, and the generation of the authenticator. All the delays and subsequent measurements are the average results with ten runs of experiments. We tested the delays of 500,000 key-value pairs for establishing a B+-Tree security index of a different order. We mainly test the relationship between the number of document keyword pairs and the total time required to generate a secure index and authenticator. We test the order of 100, 500, and 900 B+-Tree. As the number of key-value pairs increased, the time consumed is also increased. When the size of the key-value pair is less than 100,000, the time overhead is similar. When the number of key-value pairs is between 100,000 and 200,000, the time overhead of the 900th order is slightly larger than the other two, and the total time of the 100th and 500th order B+-Tree initialization is almost the same. When it is greater than 200,000, the B+-Tree has the minimum initialization time overhead of 100 steps, and the time cost of 900 orders is the largest. Overall, the initialization consumes around 1.5 seconds where the documents include 500,000 keywords, which is acceptable.
We measure the storage cost B+-Tree as shown in Fig. 13. The dataset contains 500,000 keywords. The storage overhead is about 100 MB. As the number of keywords continues to increase, the storage overhead also increases linearly. We tested the storage space of B+-Tree of different orders. As shown in Fig. 13, the storage space of the B+-Tree has little relationship with the order of the tree.
7.4 Performance of Update
The update delay is decided by the size of the database that is measured by the number of keywords. Strictly speaking, the delay is directly related to the order of the B+-Tree. To show the relationship between update latency and database size, we use different numbers of keywords to measure latency. Since the number of keywords varies from each file, we use the computing overhead to measure the number of keyword-document pairs that can be updated per second (see Fig. 14). When the number of key-value pairs is between 100,000 and 200,000, the computing overhead of delete operations decreases as the number of keywords increases. The computing overhead is almost balanced between 200,000 and 500,000. At the same time, the computing overhead and the order in this interval also has a certain relationship. As the order increases, the computing overhead of the delete operation is decreasing. In the case of a certain dataset, the computing overhead of the deletion follows the increase of the order, with a gap of 10,000 to 20,000. We observed that the computing overhead of adding and delete operations is almost the same. Both the computing overhead of adding and deleting are relatively stable.
7.5 Performance of Query
When the user launches a query operation, the token of the corresponding keyword is generated, and then the token is sent to the server. Fig. 15 shows the relationship between the number of keyword documents pairs and the search time when the server performs a search. Note that this experiment only measures the cost of generating the proof, and the the waiting time for the authenticator in the checkpoint is not included. Fig. 16 shows the query time performed on the CBF when the keyword queried by the user does not exist. As we can see from the picture, the server
7.6 Performance of Verification
In Fig. 17, We evaluate the verification delays for the authorized users by the data owner. We can see that the search time is proportional to the number of keyword-document pairs. The larger the number of keywords, the longer the server will take to search. In the case where the number of keywords is constant, the time for the server to search is also related to the order of the B+-Tree. As shown in Fig. 17, the smaller the order of the B+-Tree, the longer the server searches. When the order of the B+-Tree is 500 and 900 respectively, the verification delays are similar, but search time delay is different. A database of 500,000 key-value pairs with a verification delay of approximately 4 milliseconds. As shown in Fig. 17, the user’s verification time is proportional to the size of the data set. When the data set is fixed, the time of verification cost is stable and very short. Whereas the validation cost of the other two schemes increases linearly.
Fig. 18 illustrates the verification time when a keyword queried by a user does not exist and compares the result with previous work. Here, the verification time includes the time when the user decrypts the verifier and the time to perform the comparison verification. Since the time for the user to hash the queried keywords to the CBF is very efficient, it can be ignored. We tested the time trend of the CBF authenticator as the number of keywords changed. At the same time, we compare with the two schemes. The MVSSE in the Fig. 18 is the one proposed in the [36]. VSSE is the solution proposed in [15]. As can be seen from the figure, as the number of keywords increases, the ESVSSE scheme verification time is almost constant. The other two options are that as the number of keywords increases, so does the verification time. In general, our solution is more efficient than existing solutions under the same attack model conditions.
Fig. 19. shows the throughput for authenticator. Here,
7.7 Storage Cost of Verification
In order to better verify the communication overhead of this scheme during the verification phase, we test the storage cost of the validator in different situations by testing the keyword document. Since this solution is divided into two cases, one is that the keyword searched by the user exists(see Fig. 20), and the other is that the keyword does not exist(see Fig. 21). And we test the size of the memory occupied by the verifier when the B+-tree order is different. It can be seen from the experiment that the size of the validator is also related to the B+-tree order. When the search token exists, the minimum validator size of the document keyword pair with a data set of 500,000 is 400 KB. When the order is 100, the memory occupied by the validator is 1100 KB. Therefore, the appropriate order is also a factor to reduce storage overhead. When the keyword does not exist, because the server returns the CBF verifier. The CBF verifier is all the leaf nodes of the hash B-tree, so it has nothing to do with the tree order. When the keyword document pair is 500,000, the storage overhead occupied by the validator is 1220 KB. In real life, it is full of data in various formats,Not just data in document format, so the index validator is still acceptable at the KB level.
7.8 Comparison With Existing Schemes
We compare our scheme with the concrete implementation of dynamic SSE scheme proposed by Cash et al. [20] and Jie Zhu et al. [15]. The experiment Results show that the additional overhead introduced by ESVSSE scheme is not large. In order to justly compare the schemes, we test with the same dataset and parameters on our machine. As shown in Fig. 22, we measure the overall overhead of the initialization phase, the search phase and the update phase in those schemes. The average performance is calculated through
We verify the feasibility of this solution by comparing with the well-known dynamic SSE solution and the universal verifiable solution. In order to compare these schemes fairly, we use the same data set and parameters for testing on our machines. We tested on the famous mailbox data set, which contains 500,000 documents and extracted 2 million document keyword pairs from it. The test mainly includes two stages, initialization, search and update stage. Initialization is a test for indexing 2 million document-keywords, and testing for 10,000 data in the search and verification phase. In Table 3. Due to differences in search results, we report the average communication cost based on 5,000 runs. It can be seen from Table 3 that the average search token of SSE [20] is 390B, and the search result is 53 KB. The token of the GSSE scheme is 32B, and the search and verification overhead is 3 KB. In our scheme, the storage overhead of search verification is 2.7 KB. This shows that the overhead generated by EVSSE is better and acceptable.
我们通过与众所周知的动态 SSE 解决方案和通用可验证解决方案进行比较来验证该解决方案的可行性。为了公平比较这些方案,我们在我们的机器上使用相同的数据集和参数进行测试。我们在著名的邮箱数据集上进行了测试,该数据集包含 500,000 份文件,并从中提取了 2 百万份文件关键字对。测试主要包括两个阶段,初始化、搜索和更新阶段。初始化是对 2 百万份文件关键字建立索引的测试,并在搜索和验证阶段对 10,000 份数据进行测试。在 Table 3 。由于搜索结果不同,我们基于 5,000 次运行报告平均通信成本。从 Table 3 可以看出,SSE [20] 的平均搜索令牌为 390B,搜索结果为 53 KB。GSSE 方案的令牌为 32B,搜索和验证开销为 3 KB。在我们的方案中,搜索验证的存储开销为 2.7 KB。这表明 EVSSE 产生的开销更好且可接受。
Future Direction and Conclusion
未来方向和结论
In future work, we can consider more complex searches, such as multiple keyword extraction and conjunction query search. In addition, the Bloom filter has the potential for errors, but the probability of errors is negligible. However, comparing with the improved computational overhead, the probability of this error is still acceptable.
在未来的工作中,我们可以考虑更复杂的搜索,比如多关键词提取和连接查询搜索。此外,布隆过滤器存在错误的潜在可能性,但错误的概率是可以忽略不计的。然而,与改进后的计算开销相比,这种错误的概率仍然是可以接受的。
As graph data continue to increase such as social networks and biological networks. Current SSE cannot fully satisfy queries of graph data. In the future, we can study more complex search directions, such as matrix queries, graph adjacency queries, and so on. With the rise of Network Function Virtualization (NFV) technology, some studies have applied searchable encryption technology to the middlebox. Due to the excellent characteristics of the blockchain, the research on combining the blockchain with searchable encryption is also a direction.
随着图数据的不断增加,比如社交网络和生物网络。当前的 SSE 无法完全满足图数据的查询需求。未来,我们可以研究更复杂的搜索方向,比如矩阵查询、图邻接查询等。随着网络功能虚拟化(NFV)技术的兴起,一些研究已将可搜索加密技术应用于中间盒。由于区块链的出色特性,将区块链与可搜索加密相结合的研究也是一个方向。
In summary, this paper presents an efficient SSE scheme based on B+-Tree and CBF which supports secure verification, dynamic updating, and multi-user queries. Due to the CBF, we only need the complexity of
总之,本文提出了一种基于 B+-Tree 和 CBF 的高效 SSE 方案,支持安全验证、动态更新和多用户查询。由于 CBF,我们只需要复杂度为
ACKNOWLEDGMENTS 致谢
The authors thank the anonymous reviewers, and the editor for their valuable comments and suggestions. This work was partially supported by the National Natural Science Foundation of China (No. 61672176), the Guangxi “Bagui Scholar” Teams for In-novation and Research Project, the Guangxi Talent Highland Project of Big Data Intelligence and Application, the Guangxi Science and Technology Plan Projects No.AD20159039, the Guangxi Young and Middle-aged Ability Improvement Project No.2020KY02032, the Innovation Project of Guangxi Graduate Education No.YCSW2020110.
作者感谢匿名审稿人和编辑对他们宝贵的评论和建议。这项工作得到了中国国家自然科学基金(编号:61672176)、广西“百人计划”团队创新研究项目、广西大数据智能与应用人才高地项目、广西科技计划项目(编号:AD20159039)、广西青年和中青年人才提升工程项目(编号:2020KY02032)、广西研究生教育创新项目(编号:YCSW2020110)的部分支持。