Joined August 2015
20 Photos and videos
Recently, I faced a challenge that I am sure most people might have faced while building their data pipelines in @databricks . 📷 Challenge: Multiple tables were built for the same purpose in the same schema. As such, I needed to correct this by deleting all the
1
18
@databricks and wrote a simple script to do so, which will do the work just perfectly and faster. Below is an example of the script I wrote to deal with the challenge effectively. Ensure you are connected to a compute resource;@ serverless is okay as well. Keep learning,
1
15
This is a quick and easy way to code a simple UUID in your data pipeline. How to create a simple UUID in SQL, PySpark, and Spark SQL. This can be used to ensure the easy generation of unique data identifiers. Watch full video -> youtu.be/-6i_hKnJyN0 #SQL #pyspark #databricks
29
Using expr() in PySpark | SQL vs PySpark vs Spark SQL One of the most powerful features in PySpark is the ability to use SQL-style expressions directly inside DataFrame transformations in Databricks. Instead of chaining many column operations, expr() and selectExpr() allow you
1
17
expr() and selectExpr() allows data engineers to bridge the gap between SQL thinking and scalable Spark transformations. If you’re already comfortable with SQL, these functions can significantly speed up your transition into PySpark development. Which do you prefer in your
1
5
Filtering Data with WHERE and FILTER | SQL vs PySpark vs Spark SQL Filtering data is one of the most fundamental operations in data engineering. Whether you're cleaning data, applying business rules, or reducing the amount of data processed downstream, filtering allows you to
1
12
One important thing to remember is that in PySpark, .where() and .filter() are functionally equivalent. The choice often comes down to readability and team preferences. The best data engineers don't just know the syntax. They understand the filtering logic that powers data
1
9
transformations regardless of the tool being used. Which do you use more often in PySpark: .filter() or .where()? #DataEngineering #SQL #PySpark #SparkSQL #ApacheSpark #Databricks #BigData #DataTransformation #DataAnalytics #ETL #ELT #DataEngineer #LearnDataEngineering
37