DataBricks :Manage Data with Delta Lake
What is Delta Lake?
Delta Lake is an open-source project that enables building a data lakehouse on top of existing cloud storage
Delta Lake Is Not…
• Proprietary technology
• Storage format
• Storage medium
• Database service or data warehouse
Delta Lake Is…
• Open source
• Builds upon standard data formats
• Optimized for cloud object storage
• Built for scalable metadata handling
Delta Lake brings ACID to object storage
Atomicity means all transactions either succeed or fail completely Consistency guarantees relate to how a given state of the data is observed by
simultaneous operations
Isolation refers to how simultaneous operations conflict with one another. The isolation guarantees that Delta Lake provides do differ from other systems
Durability means that committed changes are permanent
Problems solved by ACID
• Hard to append data
• Modification of existing data difficult
• Jobs failing mid way
• Real-time operations hard
• Costly to keep historical data versions