The world in which we live generates millions of data every day. The vast majority of this data is stored in databases on servers located across the globe. This data is used to influence and shape our daily lives, from government decisions to the products consumers purchase. Data has a great deal of promise if you can gain insights from it. It has become more important than ever to be able to find patterns in this data, draw conclusions from it and leverage it to create a competitive advantage. By using to identify patterns and uncover insights, companies are able to gain actionable knowledge from their data that they can use to inform decisions, identify new opportunities, and develop strategies. analytical techniques How can you begin to from your data in order to create data-driven decisions? Exploring your data to discover and comprehend the underlying patterns, information, and facts that can help you make a better decision is the first step. extract insights In this article, you will mainly learn the following: How to manage your data using the HarperDB database. Access your data from HarperDB using Custom Function. Automate EDA with data from the harperDB database using the Sweetviz python library. So! let’s get started 🚀. Introduction to Automated Exploratory Data Analysis Exploratory data analysis (EDA) is an important step in any data science project. It means looking at and understanding the structure, patterns, and properties of a dataset before using it to make machine learning models. EDA helps data scientists find outliers, missing values, correlations, and distributions that can affect how accurate the machine learning model is. But doing EDA manually can take a lot of time, especially for large datasets with a lot of features. Automated exploratory data analysis can give a quick and complete overview of the dataset, highlighting the most important insights and patterns. AutoEDA can make visualizations and reports that are more consistent and standardized than those made by humans. In general, AutoEDA is a powerful tool that can help data scientists speed up their work, make their models better, and learn more about their data. Why Analyze data from the databases? When compared to other data sources, there are several benefits to analyzing data from databases. First, databases store data in one place and in a structured way, which makes it easier to manage and search through large amounts of data. In today’s fast-paced business world, timely decision-making is critical and data scientists need to be able to analyze data to spot new trends or possible problems quickly. Second, databases offer robust security and access control mechanisms that ensure the privacy and integrity of the data. Data stored in databases are protected from unauthorized access and manipulation, which is critical for sensitive and confidential data. In this article, I will demonstrate how you can explore data from a database such as the HarperDB database, which provides more flexibility to manage and access the data to explore easily. What is HarperDB? HarperDB is one of the fastest and most flexible SQL/NoSQL data management platforms. It offers different services, including but not limited to:- Rapid application development Distributed computing Software as a Service (SaaS) Edge computing and others HarperB has the ability to run on various devices, from the edge to the cloud without duplicating data. What I love about HarperDB is its compatibility with different programming languages such as Python, which is most commonly used for Data analysis. Other programming languages are Javascript, Java, Go, C and NodeJS. HarperDB offers different features that you can use for different cases and projects, these features are as follows: Single endpoint API Custom Functions (Lambda-like application development platform with direct access to HarperDB’s core methods). Allows JSON and CSV file insertions. Supports SQL queries for full CRUD operations. Limited database configuration required. Steps to manage data on HarperDB database We need to put some sample data into the HarperDB database before we can automate the process of exploring the data. For this tutorial, I will use the loan dataset that is available . here The first step is to create an account if you don't have one. You need to visit this link, and then click the navigation bar to see a link called ”. Step 1: Create a HarperDB Account https://harperdb.io/ “Start Free If you already have an account, visit this link to log in with your credentials. https://studio.harperdb.io/ The second step is to create a cloud instance to store and fetch your data that will be automatically explored. You just need to click the " link to add a new instance to your account. Step 2: Create a HarperDB Cloud Instance Create New HarperDB Cloud Instance" The new page will give you all the important information you need as a guideline to create your cloud instance. Before you can add the data you want to explore from the database, you must first set up a schema and a table. You only need to load the HarperDB cloud instance you previously built from the dashboard and name the schema (such as ) to complete the process. Step 3: Configure the HarperDB Schema and Table “loans” Then add a table, such as that will contain the loan data. HarperDB will further request the hash attribute, which is like to an ID number. “customers” Import the loan data you have downloaded from this to the table you have created (for example- Customers table). Click the for bulk upload from the table interface. Step 4: Import Loan data to the table link file icon The new page will ask you to either import the URL of your CSV file or drag it to select the CSV file to import, choose the option that works best for you. In the screenshot below, you can see examples of the loan data that has been added to the database. The harperDB database can be integrated with the exploratory data analysis tool to explore and visualize the data. With a feature called HarperDB makes it easy to retrieve the data via API in a very straightforward manner. How to access the data from the HarperDB Database Custom Function, This will enable seamless integration of the data analysis process with the database to quickly and efficiently extract insights from the data. Additionally, using HarperDB's Custom Function feature will also save time and make it easier to get the data, which will make the workflow smoother. So, let’s learn more about Custom Function. What is a Custom Function? Custom functions provide a way to add your own API endpoints to the HarperDB database. Fastify is what makes this feature work. It is flexible and lets your data interact well with your database. The API request will make it possible for the data from the database to be automatically sent to the application for exploratory data analysis. Here are the steps you need to follow:- Enable Custom functions by clicking “ ” in your HarperDB Studio. This feature is not enabled by default 1. Enable Custom Functions functions The following step is to create a project by naming it. For example, . It will also generate project configuration files, including: 2. Create your Project loan-api-v1 Routes folder File to add helper functions Static folder You will focus on the routes folder. Note: Create the first route in order to retrieve loan data from the customers' table in the HarperDB Datastore. Also, you should be aware that route URLs are resolved as follows: 3. Define a Route [Instance URL]:[Custom Functions Port]/[Project Name]/[Route URL] It will include: Cloud Instance URL Custom Functions Port Project name you have created The route you have defined In the route file ( ) from the function page, you can see some template code as an example. You need to replace that code with the following code: example.js server.route({ request.body= { }; }
  }); 'use strict' ; const customValidation = require ( '../helpers/example' ); // eslint-disable-next-line no-unused-vars,require-await module .exports = async (server, { hdbCore, logger }) => { // GET, WITH NO preValidation AND USING hdbCore.requestWithoutAuthentication // BYPASSES ALL CHECKS: DO NOT USE RAW USER-SUBMITTED VALUES IN SQL STATEMENTS url : '/' , method : 'GET' , handler : ( request ) => { operation : 'sql' , sql : 'SELECT Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status FROM loan.customers' return hdbCore.requestWithoutAuthentication(request); In the code above, the GET method is used to define the route . The handler function will then send an SQL query to the database to get all of the data from the /loan-api-v1 customers' table. The SQL query will fetch information from the following columns in the customers’ table:- Gender Married Dependents Education Self_Employed ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History Property_Area Loan_Status You can now easily access your data using the API URL. For example, my API URL is Open your web browser and paste your URL to view the data from your database. 4. Use the API URL to access the data https://functions-1-mlproject.harperdbcloud.com/loan-api-v1. As you can see, the API call makes it possible to quickly access the customer data that is stored in the HarperDB database. Steps to perform Automated EDA with Sweetviz is an open-source Python library that generates stunning visualizations that are packed with insightful information to get exploratory data analysis off the ground with just two lines of code. Sweetviz The library can be utilized to create a visual comparison of the dataset as well as the variables. The output application is a completely self-contained HTML file that can be loaded in a web browser. Install Sweetiz This library can be installed using the following command: pip install sweetviz Collect data from the API We use the API endpoint to get to the data, and Pandas library can load and integrate the data with the Sweetviz library. r = requests.get(url = URL) data = r.json() import request # api-endpoint URL = "https://functions-1-mlproject.harperdbcloud.com/loan-api-v1" # sending get request and saving the response as response object # extracting data in json format The code above sends a request to the API endpoint using a python package called and finally, the data will be extracted in JSON format. request Load data into pandas Dataframe The extracted data from the API endpoint is then loaded into a DataFrame using the Pandas package. df = pd.DataFrame(data) df.head() import pandas as pd #load data using pandas # show top 5 rows As you can see in the screenshot above, we were able to access the data via an API endpoint and load the data using a Pandas DataFrame. Now, with the help of Sweetviz, we can automatically explore and visualize the data. We use only two methods from Sweetviz as follows: This method analyzes the entire dataset and then automatically provides a detailed EDA report with visualization. analyze() method: This method renders the report to an HTML file that can be opened in the web browser. show_hmtl() method: loan_report = sv.analyze(df) # importing sweetviz import sweetviz as sv #analyzing the dataset loan_report.show_html( 'loan_analysis.html' ,open_browser= False ) Report was generated and we can open it in the web browser. loan_analysis.html Let’s open the loan_analysis.html to view the EDA report generated. As you can see, our EDA report is complete and offers an abundance of information for each attribute with visualization. It is simple to comprehend and is prepared in 3 lines of code. From the above screenshot, you can see that the largest loan amount was requested by one customer and the smallest loan amount is also requested by only one customer. This insight can be used to help loan officers to make data-driven decisions. 700 9 Complete code Below is the complete code with only 9 lines of code that, when executed, will automatically:- Access data from your database using an API call. Load the data using the Pandas library. Analyze the data and provide a detailed EDA report. r = requests.get(url = URL) data = r.json() df = pd.DataFrame(data) loan_report = sv.analyze(df) # import packages to use import request import pandas as pd import sweetviz as sv # api-endpoint URL = "https://functions-1-mlproject.harperdbcloud.com/loan-api-v1" # sending get request and saving the response as response object # extracting data in json format #load data using pandas #analyzing the dataset and render a report loan_report.show_html( 'loan_analysis.html' ,open_browser= False ) As more new data are added to your HarperDB database, you can now just execute the above code to generate a new to collect more insight from the data. beautiful EDA report Conclusion Congratulations 🎉, you have made it to the end of this article. You have learned: How to manage your data using the . HarperDB cloud instance Access your data from HarperDB using . Custom Function Automate EDA with data from the harperDB database using the python library. Sweetviz If you learned something new or enjoyed reading this article, please share it so that others can see it. Until then, see you in the next post! You can also find me on Twitter @Davis_McDavid.

5 Simple Tips to Become a Better Data Scientist

How to Extract Insights From Your Data

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

10 Best African Language Datasets for Data Science Projects

Building Analytics React Dashboard With Cube.js

A DIY Guide to Building an Analytics Dashboard with Node, Express, and Cube.js

Using Data Analytics for Unhindered Business Growth

Building a BNB Chain Analytics Dashboard with GetBlock

10 Resources That’ll Make You Better At Dashboard Creation

10 Best African Language Datasets for Data Science Projects

Building Analytics React Dashboard With Cube.js

A DIY Guide to Building an Analytics Dashboard with Node, Express, and Cube.js

Using Data Analytics for Unhindered Business Growth

Building a BNB Chain Analytics Dashboard with GetBlock

10 Resources That’ll Make You Better At Dashboard Creation

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps