Hi Paul. I appreciate your thoughts on the post. I completely agree with you on what you are saying about AWS Glue. I think it can be great bringing to the stack other tools depending on your specific need.

Regarding Spark, that is out of the scope of the post. But, I do have other posts where I talk little bit about that.

Put simply, Spark is executed in Airflow. It extracts data from sources and put it in AWS S3. Also, I've used for heavy-processing in large unstructured datasets.

https://towardsdatascience.com/implementing-the-functional-data-engineering-paradigm-in-data-load-processes-by-using-airflow-61d3bae486b0

https://towardsdatascience.com/generalizing-data-load-processes-with-airflow-a4931788a61f

Writing to learn! | LinkedIn profile: https://www.linkedin.com/in/ajhenaor | Buy me a coffee: https://www.buymeacoffee.com/ajhenaor

Writing to learn! | LinkedIn profile: https://www.linkedin.com/in/ajhenaor | Buy me a coffee: https://www.buymeacoffee.com/ajhenaor