If you think Delta Deep Clone is ājust replicationā, hereās a twist: it behaves a lot like a Git squash. It's a neat analogy I unpack in the Fun Fact of my DR for Azure Databricks article.
Worth a read if you design data platforms š
blog.dataengineerthings.org/ā¦#DataEngineering
Disaster Recovery for Azure Databricks is hard. And most teams only notice it when it breaks.
I wrote a hands-on guide covering Deep Clone replication, Unity Catalog metadata sync, streaming recovery, URL switching, and clean failover/failback.
Read here: blog.dataengineerthings.org/ā¦
Date partitioning in #Databricks is the way to go in many cases. But if done incorrectly, it can become a bottleneck. My latest article dives into common pitfalls and smarter methods you can apply.
Surrogate keys generated through hash functions bring efficiency but also come with the risk of hash collisions. In @kzdeb's article, dive into how common hash functions like MD5 and SHA-256 can lead to collisions and explore the birthday paradox.
towardsdatascience.com/colliā¦
Learn about the potential risk of hash collisions when using hash functions like MD5, SHA-1, and SHA-256 to generate surrogate keys in data warehouses and lakehouses in @kzdeb's latest article.
towardsdatascience.com/colliā¦
Are the reasoning capabilities of current OpenAI LLMs good enough to play a classic guessing game? Krzysztof K. Zdeb shares insights based on his experiments with playing forehead detective with several GPT models. buff.ly/4dNPoOv
What does it take to handle hierarchies in dimensional modeling? Krzysztof K. Zdeb offers detailed recommendations for different kinds of hierarchical structures in the context of data warehouses. buff.ly/3SmrbWI
Best Practices for Technical Columns in Database Design - When architecting a transactional database or a data warehouse, itās important not to forget about various types of technical columns. šļø by Krzysztof K. Zdeb buff.ly/4byev6p
It all started with š¤ GPT having an input context window of 512 tokens. After only 5 years the newest LLMs are capable of handling 1M context inputs š¤Æ. Whereās the limit?
Read full article: š medium.com/towards-data-scieā¦
It all started with GPT having an input context window of 512 tokens. After only 5 years the newest LLMs are capable of handling 1M context inputs. Whereās the limit?
by Krzysztof K. Zdeb buff.ly/4bfsQEM
"Whenever I have to assign a classword to a column name, it forces me to pause and think deeply about the data that column will hold. What does this data truly represent?" by Krzysztof K. Zdeb buff.ly/3wDVTCG
A as proud member of @PMInstitute since 2011 (and holder of PMP & ACP certifications) I'm thrilled that PMI is at the forefront of the #GenAI revolution. Let's dive in! šāØ
Join the bustling AI Community at PMI, where nearly 7k subscribers exchange knowledge & experiences. Since 2016, the community has seen a sharp increase in activity with 350 threads last year alone (vs 40 ever before). Engage with fellow PM enthusiasts at projectmanagement.com/topicsā¦
The AI Community-led survey sheds light on AI's impact on PM. A standout finding is the eagerness of mid-career professionals (45-54 yo) to adopt AI, with 83% ready to lead AI-driven projects. Full survey report: pmi.org/-/media/pmi/documentā¦