This is the second article about why Data Engineers shouldn’t write Airflow DAGs. In this new article, we are going to introduce a framework proposal for Apache Airflow.
This article aims to shed some light on how building a framework can help you solve some of the problems related to DAG writing.
In this new article, I’m going to do a short recap about the first part. So, it is not necessary to read it. Though, you should consider reading it if you want a more detailed explanation of the things I’m going to address in this one.
I believe data engineers shouldn’t write Airflow DAGs. Instead, Data Engineers should build frameworks automating the way those DAGs are created.
There’s nothing more boring than doing the same thing over and over again. And that’s exactly what you do when you write a DAG. You are doing the same thing over and over again. After a while, the job gets repetitive and your skills start being underutilized.
Jeff Magnusson, VP Data Platform at Stitch Fix, wrote a couple of years back an article where he stated that engineers shouldn’t write ETL. …
I’ve been writing for a while now about Data Warehousing. So far, I’ve written 7 articles. Some of them are quite basic. Others can be a little bit more complex and abstract. So, here’s a mini-guide about how I recommend reading them.
If you are completely new to Data Warehousing, I recommend starting in the next post. It comprises basic information about data lakes and data warehouses. So, if you are new to these terms, I hope this post will help you to know a little bit more about them.
If you are looking for understanding a bit more about…
People starting in the Data Science field can feel a little bit lost. Especially, when it comes to choosing what role they should pursue. The explosion of roles we have in data science can be overwhelming.
With so many roles to pursue comes a high number of skills practitioners have to learn. The problem is that people don’t know where to start.
This brings us to the discussion of which strategy should one use to thrive in a data science career. Should one become a generalist or a specialist?
This is a difficult question to approach since it depends on…
It wasn’t so long ago since I wanted to become a Data Scientist. I was so fascinated by the possibilities of becoming one that I did everything I could. Fortunately, I had a breakthrough that changed the way I approached my career.
I realized I was defining myself by my role. So much that I lost of sight the importance of the process. I forgot the importance of remembering why you are doing what you are doing. I also forgot the importance of enjoying what you’re doing. Even when you are doing something you don’t want to do.
You can experience several problems if you overlook these characteristics. For example, if reprocessing past data is something not easy to do. Or if introducing new changes to your data pipeline is a headache, you might have overlooked them.
Particularly, overlooking reproducibility and maintainability is an easy mistake to make. Especially, if you don’t know what it means or if you don’t plan enough for it.
In this post, I want to share with you some principles and lessons my team has learned trying to solve such problems…
We are not aware of how we use our words. So, we usually use them in the wrong way. Too many times, we use them to put negativity into the world.
Words are a powerful tool that can change lives. So powerful, that they can make the world a better place.
I know it might sound silly and sometimes meaningless. But, is something you should try to see its worth.
I recently realized I’m a negative person. I don’t think I am. I even consider myself an optimist. But, that’s not how other people usually perceive me sometimes. …
If you are working with data and you are considering taking your first AWS certification, you might be wondering which is one is the right one for you.
This was a hard choice to make for me since there are 12 possible certifications one can take in AWS. I chose the AWS Developer Associate certification after some research. But it wasn’t easy.
That’s why I want to share my story. I hope this post sheds some light on how the AWS Developer certification can help to gain some new knowledge — especially if you are a Machine Learning Engineer.
I’m not going to lie to you. You’ve probably already read some of the things I going to say in this post. But, I do it anyway because it feels true to me.
Some of the things I want to share with you help me to keep writing. I just want to pass the lessons to keep you motivated in the process.
Also, I should warn you I’m not a fancy, famous writer. In fact, I’ve only published 13 stories after a year of writing. So, I’m only someone who is trying to learn through writing… And I’ve learned some…
In past posts, I’ve been talking about Data Warehouses, their basic architecture, and some basic principles that can help you to build one. Today, I want to show you an implementation of Data Warehouse on AWS based on a case study performed a couple of months ago.
This implementation uses AWS S3 as the Data Lake (DL). AWS Glue as the Data Catalog. And AWS Redshift and Redshift Spectrum as the Data Warehouse (DW).
Note: This post can be confusing if you are not familiar with some of the terminology and concepts I’m using here. …