Episode 65 Shownotes
Welcome to another episode of the Datascape Podcast. In this episode, the hosts discuss updates from Databricks as a product and its open-source projects. Tune in to hear about new integrations, improved features, performance optimization capabilities, proprietary announcements, and more.
Don’t miss this jam-packed episode on all things Databricks.
Key Points From this Episode
- Introduction of today’s topic: Recapping the 2022 Databricks Data and AI summit.
- Luan introduces Delta Lake 2.0, Databricks’s Spark tables upgrade, and some of its features, including change data feed for batch streaming, Z-Ordering, and Python in Scala API support to optimize Z-Ordering.
- Luan introduces MLflow pipelines and their ability to simplify entire workloads for machine learning.
- Luan describes Project Lightspeed, which provides offset management, checkpoint recovery, and stream checkpointing on Spark.
- The hosts explore Spark structured streaming.
- The hosts discuss Spark Connect, its security protocols to improve connections into Spark, and remote connectivity capabilities.
- The hosts cover proprietary announcements: Spark 3.3’s query execution improvements, copying capabilities, trigger jobs, and dbt integration.
- Luan discusses Delta Live Tables implementation.
- The hosts examine Unity Catalog.
- The hosts discuss the state of Databricks and where it sits in the organization data platform.
Links Mentioned in Today’s Episode
- Warner Chaves on LinkedIn
- Warner Chaves on Twitter
- Data Lake
- SQL Server