Published inInside LeagueStreaming Data to BigQuery with Dataflow and Updating the Schema in Real-TimeIn our previous story, we saw how to stream data to Big Query and also add new columns when needed. This solution though is not really…Dec 26, 2021A response icon4Dec 26, 2021A response icon4
Authenticated calls to cloud functions with PythonThe past few weeks I developed and deployed a cloud function that is supposed to get called only by authorized users/service accounts and…Jun 4, 2021A response icon1Jun 4, 2021A response icon1
Published inInside LeagueLoading complex JSON files in RealTime to BigQuery from PubSub using Dataflow and updating the…In my previous post, I explained how to stream data from Salesforce to PubSub in real-time. The next logical step would be to store the…Jan 17, 2021A response icon6Jan 17, 2021A response icon6
Published inInside LeagueReal-Time Streaming Salesforce Updates to PubSubDec 12, 2020A response icon2Dec 12, 2020A response icon2
Published inThe StartupPyspark: How to Modify a Nested Struct FieldIn our adventures trying to build a data lake, we are using dynamically generated spark cluster to ingest some data from MongoDB, our…Aug 29, 2020A response icon3Aug 29, 2020A response icon3
Schedule Dataflow Templates with AirflowOk, so, we’ve written our Dataflow Template with Python, now what? We want to schedule it to run daily and we’re going to use Airflow for…Jul 4, 2020Jul 4, 2020
Published inAnalytics VidhyaTransform JSON to CSV from Google bucket using a Dataflow Python pipelineIn this article, we will try to transform a JSON file into a CSV file using dataflow and pythonMay 31, 2020A response icon3May 31, 2020A response icon3