PPGAN: Privacy-preserving Generative Adversarial Network (under review)

Published in The 25th IEEE International Conference on Parallel and Distributed Systems (ICPADS) (CCF-C, Core-B Conference), 2019

Generative Adversarial Network (GAN) and its variants serve as a perfect representation of the data generation model, providing researchers with a large amount of highquality generated data. They illustrate a promising direction for research with limited data availability. When GAN learns the semantic-rich data distribution from a dataset, the density of the generated distribution tends to concentrate on the training data. Due to the gradient parameters of the deep neural network contain the data distribution of the training samples, they can easily remember the training samples. When GAN is applied to private or sensitive data, for instance, patient medical records, as private information may be leakage. To address this issue, we propose a Privacy-preserving Generative Adversarial Network (PPGAN) model, in which we achieve differential privacy in GANs by adding well-designed noise to the gradient during the model learning procedure. Besides, we introduced the Moments Accountant strategy in the PPGAN training process to improve the stability and compatibility of the model by controlling privacy loss. We also give a mathematical proof of the differential privacy discriminator. Through extensive case studies of the benchmark datasets, we demonstrate that PPGAN can generate high-quality synthetic data while retaining the required data available under a reasonable privacy budget. Read more

Download here


Published in 计算机与数字工程, 2019

国家财政收入来源之一是税收, 而目前, 税务稽查存在着数据采集不全面, 数据传递和存储技术有待完善, 信息无法共享导致稽查成本高和效率低等多种问题. 论文提出了基于时空信息的智慧稽查的大数据应用平台框架构建, 旨在将互联网+, 大数据挖掘, 数据可视化等技术服务综合运用于此平台, 实现税务稽查的低成本, 高效, 达到了基础数据的时间空间一体化, 各行业部门资源整合, 信息共享. Read more

Download here

Dominant Dataset Selection Algorithms for Time-Series Data Based on Linear Transformation

Published in IEEE Internet of Things Journal, 2019

With the explosive growth of time-series data, the scale of time-series data has already exceeds the conventional computation and storage capabilities in many applications. On the other hand, the information carried by time-series data has high redundancy due to the strong correlation between time-series data. In this paper, we propose the new dominant dataset selection algorithms to extract the dataset that is only a small dataset but can represent the kernel information carried by time-series data with the error rate less than {\epsilon}, where {\epsilon} can be arbitrarily small. We prove that the selection problem of the dominant dataset is an NP-complete problem. The affine transformation model is introduced to define the linear transformation function to ensure the selection function of dominant dataset with the constant time complexity O(1). Furthermore, the scanning selection algorithm with the time complexity O(n2) and the greedy selection algorithm with the time complexity O(n3) are respectively proposed to extract the dominant dataset based on the linear correlation between time-series data. The proposed algorithms are evaluated on the real electric power consumption data of a city in China. The experimental results show that the proposed algorithms not only reduce the size of kernel dataset but ensure the time-series data integrity in term of accuracy and efficiency. Read more

Download here

Big Data Platform Architecture under The Background of Financial Technology

Published in Proceedings of the 2018 International Conference on Big Data Engineering and Technology, 2018

With the rise of the concept of financial technology, financial and technology gradually in-depth integration, scientific and technological means to become financial product innovation, improve financial efficiency and reduce financial transaction costs an important driving force. In this context, the new technology platform is from the business philosophy, business model, technical means, sales, internal management and other dimensions to re-shape the financial industry. In this paper, the existing big data platform architecture technology innovation, adding space-time data elements, combined with the insurance industry for practical analysis, put forward a meaningful product circle and customer circle. Read more

Download here

Decision Tree Model in the Diagnosis of Breast Cancer

Published in 2017 International Conference on Computer Technology, Electronics and Communication (ICCTEC), 2017

Breast cancer is the second leading cause of cancer death in women. At the same time, it is one of the most curable cancer if it could be diagnosed early. More and more researchers have confirmed that the decision tree model has a good ability to accurately diagnose. This paper presents a diagnostic method for breast cancer based on the decision tree model combined with feature selection. Experiments were conducted on different training test divisions of the Wisconsin Breast Cancer Data Set (WBCD), a common method used by researchers to diagnose breast cancer based on machine learning methods. In this paper, in order to reduce the complexity of the decision tree model, this paper proposed to delete some highly relevant features of … After data correlation and independence tests, it finally chosed the tumor thickness, cell shape consistency, single epithelial cell size and mitosis as a subset of the decision tree model. Experimental results show that the classification accuracy (94.3%) significantly outperforms the state-of-theart method with respect to a variety of metrics. Read more

Download here