Joined August 2019
Photos and videos
roi teveth retweeted
18 Mar 2020
(1/2) As promised, @RTeveth and I just published the second #blog post in our series about @ApacheSpark Dynamic Partition Inserts, based on our production experience at @nielsen! In part 2, we deep dive into how Dynamic Partition Inserts works, the different S3 connectors used...
1
1
2
roi teveth retweeted
(1/2) @RTeveth and I just published a post (the first part in a series) about @ApacheSpark Dynamic Partition Inserts, and we think you'll find it interesting: medium.com/nmc-techblog/spar… Our #bigdata group at @nielsen uses #ApacheSpark to process 10’s of TBs of raw data...
2
2
5
roi teveth retweeted
19 Feb 2020
(3/3) Check out the issue (issues.apache.org/jira/brows…) and the matching pull request (github.com/apache/airflow/pu…), and keep your fingers crossed ;) #bigdata #ApacheSpark #dataengineering #GCP #Spark #K8s #Nielsen

1
3
roi teveth retweeted
19 Feb 2020
(2/3) Luckily, my colleague, @RTeveth (@nielsen) is working on contributing his work to integrate the aforementioned operator into #ApacheAirflow, and with the generous help of some its committers (e.g @CzerwonyElmo and @kaxil), we’re hoping to get it merged before #Airflow 2.0!
1
3
2
roi teveth retweeted
19 Feb 2020
(1/3) If you want to run @ApacheSpark on @kubernetesio, you have a few alternatives, e.g Spark-on-K8s-operator by @GCPcloud (github.com/GoogleCloudPlatfo…). But what if you want to schedule your jobs using @ApacheAirflow? That narrows-down your options.
1
2
3