Published inInside LeagueStreaming Data to BigQuery with Dataflow and Updating the Schema in Real-TimeIn our previous story, we saw how to stream data to Big Query and also add new columns when needed. This solution though is not really…Dec 26, 20214Dec 26, 20214
Authenticated calls to cloud functions with PythonThe past few weeks I developed and deployed a cloud function that is supposed to get called only by authorized users/service accounts and…Jun 4, 20211Jun 4, 20211
Published inInside LeagueLoading complex JSON files in RealTime to BigQuery from PubSub using Dataflow and updating the…In my previous post, I explained how to stream data from Salesforce to PubSub in real-time. The next logical step would be to store the…Jan 17, 20216Jan 17, 20216
Published inThe StartupPyspark: How to Modify a Nested Struct FieldIn our adventures trying to build a data lake, we are using dynamically generated spark cluster to ingest some data from MongoDB, our…Aug 29, 20203Aug 29, 20203
Schedule Dataflow Templates with AirflowOk, so, we’ve written our Dataflow Template with Python, now what? We want to schedule it to run daily and we’re going to use Airflow for…Jul 4, 2020Jul 4, 2020
Published inAnalytics VidhyaTransform JSON to CSV from Google bucket using a Dataflow Python pipelineIn this article, we will try to transform a JSON file into a CSV file using dataflow and pythonMay 31, 20203May 31, 20203