Welcome to HackerNoon’s Writing Prompts! Would you like to take a stab at answering some of these questions? The link for the template is HERE.
My name is Maggie and I am a data scientist currently working on freelance projects and writing! I previously worked as a Data Scientist for an automotive manufacturer. I was there for 3 years and started as one of the first data scientists on the team.
When I started working on this team it was “back in the day”, although it was only a few years ago. So much has changed!
The team I joined was a small data science team within the R&D arm of the engineering department. It was a cool opportunity to be doing data science in an R&D environment because everything was very experimental and we had a lot of freedom.
We collected data from our products: big, heavy-duty, industrial trucks. These were our “end-devices” in an IoT sense.
As a new, scrappy team, we operated a lot like a start-up within the large organization and this came with its own challenges. One of the big ones was that there were no data engineers. I was hired partially because I had experience as a Linux systems administrator in a data center and I had done well in my High-Performance Computing class in my master’s degree program.
One of my classmates in that class was also on the team and reached out to me for my resume. I was excited to get a job where I could actually apply the big data skills that we had painfully learned in school--ie., working with Hadoop, writing stateless programs that could be distributed, writing map reduce code from scratch, and running in the cloud.
I cut my teeth on building end-to-end IoT data solutions focused on predictive maintenance for fleets of vehicles. I wrote pipelines using PySpark, then later using Kafka in NiFi, picking up MQTT messages, passing them to HBase tables, then later to Kafka messaging queues. I wrote jobs to transform the raw landed data into a usable format. I designed the database structure, aggregated tables, and features to feed into my analyses, dashboards, and data products. I performed experiments with test benches and live vehicles to collect data to answer specific engineering questions about vehicle performance, part reliability, and oil life.
I was a mathematics major during my undergraduate degree. I was required to take a semester of programming (C++) and loved it, so I took an additional semester of Object Oriented Programming.
However, I was a single mom while in college. I worked full-time in addition to taking classes full-time. There were some lab-based programs that I was interested in, but labs are typically several hours a few times a week for only 1 credit. I would have needed to spend an additional year in school to get a lab-based degree and continue to work as much as I needed to support my son. So, I got my degree in mathematics as it required 0 lab work.
After graduating, with a degree in mathematics and no real applied skills, I had a hard time finding a good job. I couldn’t even get an internship. The only recruiter at the job fair who called me back was the National Guard recruiter, of course. So, I joined the National Guard to help pay back my student loans and get some real job training.
I became a Signal Officer in the Wisconsin National Guard and earned IT certifications that helped me get a job in IT. I worked as an IT operations analyst for an insurance company briefly before getting a dream job as a Linux system administrator in a data center for the national guard.
This was in 2014-2015 and I had been hearing the word “Data Science” and getting interested. It really appealed to me because I really loved computers and math. So within a few months of starting my job in the data center, I applied for an online Master’s degree program in Data Science. I think I was among the second group of students to join that program. It was very much the early days of data science Master’s programs.
It took me about 4 years to complete my Master’s degree--I was a single mom working full-time in the data center with a part-time job as an Officer in the National Guard. I took 1 class at a time and loved every minute of it.
When I was working as a data scientist for a corporation, my day-to-day typically involved at least one meeting, lots of chatting online with my co-workers, some impromptu phone calls with collaborators, and lots of coding.
I was always either building a pipeline, managing tables and automated jobs, or deep into Python code working on an analysis or model. I used Python primarily, but we also had licenses for JMP, which is a great tool. Eventually, we started using Dataiku and I spent most of my time in Python notebooks, Dataiku, NiFi, or Hue.
Python, of course. VS Code. Pandas, Scikit-learn, and a whole toolbox of packages in Python
I love Dataiku as a tool as well.
I love the end-to-end process that includes thinking about a problem, figuring out how what kind of data you need to solve the problem, gathering the data, analyzing it, modeling it, building some kind of product or output, and then telling the story to help others see what you have done.
I love the technical conversations and mathematical creativity that goes into solving the problems, but even more so, i love pulling back from the weeds of the analysis and putting together a cohesive story about the work that was done. The analysis is useless if you can’t communicate about it, and when we communicate well, the data science work becomes much more interesting and useful to everyone.
I also love the people. I have never felt more at home than I have on a team of IT or data professionals. I’ve been lucky to work with people from all over the world and I think it’s a beautiful opportunity to work with people of different ages with diverse skills, cultures, etc. group of people, while at the same time geeking out over cool new topics in data science.
I think people do not know that the field has burst open so quickly that there is no single definition or skill set of a data scientist. This can be overwhelming and lead to imposter syndrome for people who are getting started. I think it’s important to show how much variety there are in tasks, skills, applications, and points of view that goes into data science.
I am excited about decentralized learning. I think decentralization is very important for data ownership and privacy. It also plays a big role in the efficiency of algorithms. These are a few big challenges in the field as far as making more energy-efficient algorithms and respecting people’s data rights.
I am also excited about IoT and cyber-physical systems. Through real-time data gathered from sensors, we can monitor and optimize processes to reduce waste and improve efficiency to make significant impacts on reducing carbon emissions. Additionally, IoT has applications in farming, supply chain, and weather monitoring that can potentially help us solve real human issues around the world.
I would be a researcher and writer! I am currently working on building a website at www.datalabnotes.com and I am writing a book to help data scientists organize their projects. The book is equally useful for consultants, students, and seasoned professionals because it walks through the data science project lifecycle and provides a template for asking questions and taking notes along the way. It’s largely inspired by CRISP-DM, with some added features based on recently published papers, industry best practices, and on my own experience.
Identify the niche that you want to work in and do an analysis. Here’s how: If you are restricted by geography, look at the companies around you and companies that offer remote positions. If you are not restricted by geography, then do some soul-searching and figure out what types of problems you want to work on or what companies you want to work for. Identify what industries those companies are in, what kind of data they might have and what problems they are solving.
Don’t just guess, read the job descriptions and make a spreadsheet. Start accumulating data around the types of skills and technologies listed in the job advertisements. Connect with people at these companies through LinkedIn, Kaggle, HuggingFace, and Medium blogs. Pay attention to what they are talking about and engage with them to understand what tools they are using and what kinds of problems they might be solving.
Now the hard part. Get creative. Identify a project or two where you can showcase these skills. You can either build your own dataset or use open-source data. The project should be related to the projects that you observe in the target industry or at the target companies. Of course, you will need to be creative because you can’t possibly do the same work with open-source data and limited resources.
You need to find smaller, related projects. You can read papers and write reviews of related work, or find a small proof of concept project. Or build a dashboard using industry data. Or build a pipeline that simulates data and a use case from the target industry. The possibilities are endless. Find a way to make a relevant project. If you are clear about the industry or problem you want to target, then a few projects like this can apply broadly to multiple companies.
The third step should result in a project that you can add to your portfolio--consider using a Jekyll site on GitHub pages as a very quick and easy way to have a professional online portfolio. Even if you have Kaggle notebooks or Tableau public dashboards, create the Jekyll site as a central website to showcase all of these projects in one place. This will make it extremely easy to share and link to all of your work at once.
Mine is a work in progress but here it is: https://projects.datalabnotes.com/. I’ve seen better, but it’s better to have a work in progress than nothing at all!
So now you should have 1) a clear idea of the industry/company/domain you want to work in; 2) a professional network of people related to that industry/company/domain; and 3) a portfolio relevant to the job and industry you want. Oh by the way, make sure you post and interact with your network as you are doing step 3!
I think this is a solid recipe for standing out and getting some calls for interviews!
Would you like to take a stab at answering some of these questions? The link for the template is HERE, just start writing! Interested in what others had to say in their answers? Click HERE. Interested in reading the content from all of our writing prompts? Click HERE.