As fantastic as it may seem, data is more valuable than anything in the world today and the companies who own data are the ones exerting their authority over the inside and outside worlds of technology. This is the reason the job of a Data Engineer has become all the more enticing to the new crop of engineers coming out of college.
Def. dumbed down - A person who collects all raw data; transforms and maintains it in databases on computers for task-specific usage.
A valuable asset for anyone looking to break into the Data Engineering field is understanding the different types of data which are as follows:
Structured Data → Data is organized into tables with rows and columns.
Semi-structured Data → Data in the form of XML, CSV, or JSON files
Unstructured Data → Data from Emails / PDFs
Binary Data → Data such as audio or image files.
The level of organization within these types of data follows a top to bottom order with Structured Data being the most organized and Binary Data being the least.
Data Engineers create a Data Pipeline that prepares data for the task at hand.
Creating a data pipeline is a three-step process:
The data in a Data Lake is in its raw form, while the data in Data Warehouse is transformed for a specific purpose.
A data scientist would prefer the Data Lake since it has more data. A data analyst would prefer the Data Warehouse as it has data specific to the job.
ETL → “Extract, Transform, and Load” is the pipeline used by Data Engineers
The different types of Databases are as follows:
Relational Databases - Postgres, MySQL
NoSQL - MongoDB
NewSQL Databases - Vault DB
Search Databases - Elastic Search
Computation Databases - Apache Spark
Data engineering is a more and more enticing career option. Hopefully, anyone looking to break into the Data Engineering field found this story helpful.