Manjula G S / J.P. Morgan Services India Pvt., Ltd.,
As there is an exponential growth of data in every field of life, the assessment and extraction of data from the massive data sets has derived as a dreadful challenge in golden era of Big Data. Conventional security methods cannot be adapted to big data due to its massive volume, and range. Undoubtedly, mining fruitful information from this massive data has been an universal interest for the organizations having large dataset. Big data life cycle includes three phases such as data generation, data storage, and data processing. In big data process, distributed systems are adapted since it needs large storage and high computational power. As many parties are engaged in these systems, the possibility of the violation in security concerns increases. Since Big Data contains individual’s personal information, privacy is the foremost security concern. The main objective is to present an exhaustive overview of the privacy preservation mechanisms in big data life cycle. The modern privacy-preserving methods such as the generalization are capable of effectively managing the privacy assaults on a sole data set, whereas the protection of privacy for multiple data sets continues to be hard. Therefore, with intention of conserving the secrecy of multiple data sets, it is desirable to initially anonymize whole data sets and thereafter encrypt them before amassing or exchanging them in cloud. The challenges in existing mechanisms and eventual research discussions relevant to privacy preservation in big data are mentioned. The security techniques to protect the data set from being accessed by illegal users are also discussed.