Tools talked about throughout the day #
Source of Truth — Dr. Thibaud Freyd #
- Slides used GPT images.
- Talked about “shifting the work, left.”
- Azure AI Document Intelligence seems like a worthwhile tool. I wonder what alternatives are out there?
Roche’s Maxim of Data Transformation — Alexander Arvidsson #
- Slides used GPT images.
- Talked about “shifting the work, left.”
- “You should do the transformation as far upstream as possible, and as far downstream as necessary.”
The Importance of Testing in the Development Cycle — John Martin #
- Slides used GPT images.
- I’ve got hundreds of sources and they of course have many endpoints and tables:
- Should I be testing all of these?
- How do I pull out from business users what the data tests should be? Especially when they don’t know their own data?
- Recommends that we go the route of:
- Take mock / test data.
- Run tests.
- Take processed test data.
- Match against real data that’s also been run through the tests.
- Do you test every single table / endpoint? No — work with the business to define the most important datasets, because testing costs time and money.
- Terminology:
- Test scope: what will and won’t be tested.
- Test cases: scenarios which will be verified.
- Assertions: validation of expected behavior.
- Management of reference data is key — leverage source control and make it part of the development tasks.
Visualizing Data Mesh — Anurag Kale #
- Talked about “shifting the work, left.”
- Why is the data world behind the software engineering world?
- Domain ownership.
- Data as a product.
- Open Data Product Specification — linting, SLA, quality, etc.
- Open data contracts.
- Self-serve data platform.
- Distributed governance.
- You must have a catalog so that you can enforce a unified request access — this means you’ll have to implement infrastructure as code.
- Roger Martin’s Cascade of Choices for Strategy.
- Failure — but why did Anurag fail?
- Didn’t have an access control system (RBAC) in place across the whole company.
- What user is what user across systems, getting people access to the necessary tools.
- Underestimated the end users.
- They found ways to use the end product in a different way than anticipated.
- Didn’t have an executive-level sponsor.
- Doing a bottom-up approach meant that there was no business buy-in.
- Didn’t have an access control system (RBAC) in place across the whole company.
- Recommended books:
- Data Mesh (O’Reilly)
- Flow Engineering
- Team Topologies
- The Phoenix Project
Turning Data into Influence — Roman Tesolkin #
Notes light — interesting talk on translating data work into organizational influence.