Hi Paul. I appreciate your thoughts on the post. I completely agree with you on what you are saying about AWS Glue. I think it can be great bringing to the stack other tools depending on your specific need.
Regarding Spark, that is out of the scope of the post. But, I do have other posts where I talk little bit about that.
Put simply, Spark is executed in Airflow. It extracts data from sources and put it in AWS S3. Also, I've used for heavy-processing in large unstructured datasets.