You’ve got a point here. Your question is really interesting. Thinking about it, I would say that in that case guaranteeing task determinism is not possible.

Check what Maxime says about it:

“Similarly, if we want to change a business rule and perform a backfill all of the downstream computation, we need the guarantee that the change of logic is the only moving part. We need that same guarantee that the blocks of data used in the computation are identical to the ones used when re ran the original process, or in other words, that the sources have have not been altered.

To put it simply, immutable data along with versioned logic are key to reproducibility.”

I would say that you need to generate an immutable source. You can save your logs in some database — e.g., Elasticsearch or something similar. So, such a database becomes the source itself. However, I’m not really sure. I think dealing with retention policies can be messy sometimes.

I hope this clarifies your question.

Have a great day! :D

Writing to learn! | LinkedIn profile: | Buy me a coffee:

Writing to learn! | LinkedIn profile: | Buy me a coffee: