Introduction

Open sourced in April 2019, Delta Lake is a Databricks project that brings reliability, performance and lifecycle management to data lakes. In October 2019, the Delta Lake Project was hosted by the Linux Foundation to Become the Open Standard for Data Lakes. In this article we will have two parts. In the first one we will explore the challenges facing data engineers while building data pipelines. While in the second one we will introduce the Delta Lake project, its key concepts and features, and how it offers an elegant solution to build reliable performant data pipelines.

A data engineer’s sad story

Bob is an enthusiast…


Rise of the streaming use cases

In 2019, a study by Lightbend and The New Stack [1] revealed that The use of stream processing for AI/ML applications increased four-fold in two years and they expect that it will keep increasing in the coming years. This rise of streaming use cases is related to the fact that real time data has increased tremendously over the past years as never seen before. Think about all those connected devices and people generating unbounded data 24/7. This huge amount of real time data needs to be analyzed as soon as possible because it’s value diminishes over time. Organizations need to…

DataBeans

Simplify your data pipelines through simple reusable components [databeans.fr]

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store