Python is often the choice for developers who need to apply data analysis in their work or mainly data scientists/data engineers whose tasks are more related deriving insight from the data. One of Python’s greatest assets is its extensive set of libraries. Recently, I was working on very popular Data Mining algorithms (i.e: FP-Growth and Custom A-Priori). There was a situation I wanted to get comprehensive analysis report on results generated by these algorithms. As a support lib for work introducing Data Science “doc-dff — Generate the diff data between two files” doc-diff supports the following features: Generate the following comparison reports common_in_doc1-and-doc2-%Y-%m-%d.csv common_key_with_diff_values-%Y-%m-%d.csv exclusive_in_doc1-%Y-%m-%d.csv exclusive_in_doc2-%Y-%m-%d.csv Compare two files and return following ‘dicts(prodCode, recommendation)’ common_in_doc1_and_doc2_list = dicts() common_key_with_diff_values_list = dicts() exclusive_in_doc1_list = dicts() exclusive_in_doc2_list = dicts() Install $ pip install doc-diff Implementation doc_diff Diff doc_diff gen_comp_report from import from import __name__ == : if '__main__' _\# Data file location_ a\_priori\_csv\_location = **"./data/a-priori.csv"** pfp\_csv\_location = **"./data/pfp.csv"** _# Process a-priori.csv data file_ a\_priori\_diff = Diff(a\_priori\_csv\_location) a\_priori\_diff.process\_file() _# Process pfp.csv data file_ pfp\_diff = Diff(pfp\_csv\_location) pfp\_diff.process\_file() gen\_comp\_report(a\_priori\_diff, pfp\_diff) I’m looking forward to open source all my supportive lib for Data Science/Data work. Let me know what you think about ‘doc-diff’ below in the comments and share your thoughts. If you want to share any new features/issues, . Engineering feel free to open an issue in the GitHub repository