Introducing a Simple Module for Parsing CSV Files

Written by arthur.tkachenko | Published 2022/04/06
Tech Story Tags: slogging | open-source | javascript | programming | food-tech | datasets | github | codeclimate | web-monetization

TLDRArthur Tkachenko has created a simple module for parsing CSV files. The module is a simple tool that can be used to play with datasets related to food tech. It's simple: connecting a dataset in CSV format, parsing it, and exporting data as you need it, while you need to make it work with 10 different datasets. It makes other modules that we are building data-agnostic and more independent to a database/frameworks/logic. At this moment i'll pickic CI/CDCD/CCC/Travic to extend it so when a new version of this package is published it'll run against some point of this point.via the TL;DR App

This Slogging thread by Arthur Tkachenko occurred in slogging's official #programming channel, and has been edited for readability.
Arthur TkachenkoMar 27, 2022, 11:05 PM
Today I want to introduce a simple module for parsing CSV files
Arthur TkachenkoMar 27, 2022, 11:25 PM
Recently I was exploring my old repository: https://github.com/Food-Static-Data/food-datasets-csv-parser
Arthur TkachenkoMar 27, 2022, 11:26 PM
Inside I have a cool set of small modules, that helped me a lot. As my code is highly tied to those packages -> I need to pray for developers that build them, so I don't need to spend precious time.

List of modules that I'm using:
  • csv-parser
  • fs
  • lodash
  • path
  • path-exists
  • shelljs
Arthur TkachenkoMar 27, 2022, 11:27 PM
Why did I create this package? It's simple. During our work @ Groceristar, we came around a number of databases and datasets, related to "food-tech". To be able to extract that data and play with it -> you need to parse CSV files.
Arthur TkachenkoMar 27, 2022, 11:27 PM
Arthur TkachenkoMar 27, 2022, 11:28 PM
Arthur TkachenkoMar 27, 2022, 11:28 PM
Arthur TkachenkoMar 27, 2022, 11:29 PM
Arthur TkachenkoMar 27, 2022, 11:36 PM
Arthur TkachenkoMar 27, 2022, 11:44 PM
I will also post updates about building modules for static data on indie hackers. While it didn't help with promotions a lot, founders are pretty positive people and their support really matters. Here is an org that I created few years ago: https://www.indiehackers.com/product/food-static-data
Arthur TkachenkoMar 27, 2022, 11:45 PM
As usually, experienced developers might tell me that I'm stupid and CSV parsing is a mundane procedure. But I don't care. I realized that for a few separate projects we are running similar code. So I decided to isolate it.

I did it a few times before I finally find a way to make it work as I like. And you can see how it looks right now.

I can say, not ideal, but it was working fine for me. Right now I plan to revamp this package a little bit, in order to make it work with the latest versions of rollupjs and babel.
Arthur TkachenkoMar 27, 2022, 11:50 PM
Arthur TkachenkoMar 27, 2022, 11:58 PM
While the idea is simple: connecting a dataset in CSV format, parsing it, and exporting data as you need it, while you need to make it work with 10 different datasets, things arent as easy as they sound in your head.
CSVs not only related to food tech datasets. But for me was important to be able to use different datasets and easy to play with it. It makes other modules that we are building data-agnostic and more independent to a database/frameworks/logic. Basically, around this idea, we created and optimized like 13 repositories. Recently I created a separate organization that will be focused on those repositories only.
Later I plan to remove some repositories when they wouldn't be replaced by other, more popular, and stable tools. This current module can be useful for parsing other datasets too. But making it separate from the food tech topic isn't my task at this point.
Arthur TkachenkoMar 27, 2022, 11:59 PM
And I was able to include and implement cool and important packages, like husky and coveralls. I can't say that I get most from them, but at the same time, it helped me to jump into the "open source ocean" that related to the GitHub rabbit hole that I'm still exploring for so many years.
Arthur TkachenkoMar 28, 2022, 12:00 AM
and it's good to not just type another line of code, but also be able to see that your codebase is solid and nothing breaking it behind your back
Arthur TkachenkoMar 28, 2022, 12:02 AM
Arthur TkachenkoMar 28, 2022, 12:02 AM
CodeClimate(https://codeclimate.com/) helped me to explore and be able to take another look at how to develop things.
Arthur TkachenkoMar 28, 2022, 12:03 AM
Arthur TkachenkoMar 28, 2022, 12:04 AM
Arthur TkachenkoMar 28, 2022, 12:05 AM
Arthur TkachenkoMar 28, 2022, 12:07 AM
Yeah, codeclimate shows that I have code duplicates and ~50h of tech debt. Looks like a lot, right? But this is a small independent package.
Imagine how much tech debt has your huge monolith project. Years of wasted time of developers, probably 🙂
Arthur TkachenkoMar 28, 2022, 12:07 AM
At some point i'll remove duplicates and it will reduce number of hours on this page.
Arthur TkachenkoMar 28, 2022, 12:10 AM
Plus, usually, your product owner or CTO is busy and can't review code and be able to track things inside your code.
CodeClimate can do some stuff for you. Just check those settings. Plus, they support open-source movement. So if your code is open and located on GitHub, you can use it for free.
Arthur TkachenkoMar 28, 2022, 12:14 AM
Stretch goals are simple
  • I want to invest some time into CI/CD stuff. At this moment i'll pick Travic CI. At some point i'll extend it, so when a new version of this package is published, i'll run it against our datasets and will see if something breaks or not.
  • I also need to remove duplicated code that i was moved into separated packages but still present here, due to back capability.
  • and it's also not cool to see JS code with all there csv files at the same repository. I need to came with idea about how to organize folders and make it easy to navigate. While it works for me - other people might find it very confusing.
Arthur TkachenkoMar 28, 2022, 12:17 AM
We even did a great readme file with an explanation of how to run this package
Arthur TkachenkoMar 28, 2022, 12:18 AM
We collected a great number of datasets that can help a wast number of food projects. Some of them even sell the latest updates for money.
Right now this module was tested with:
- FoodComposition dataset
- USDA dataset(i pick 4 major tables with data)
- FAO(Food and Agriculture Organization of the United Nations) dataset
This module is not just for parsing data, we also have a need to write files in JSON format with formatted data inside.
Arthur TkachenkoMar 28, 2022, 12:18 AM
Show some love if you want more articles like this one! any activity will be appreciated.

Written by arthur.tkachenko | inspiring
Published by HackerNoon on 2022/04/06