Open in app

Sign In

Write

Sign In

Alex Fragotsis
Alex Fragotsis

131 Followers

Home

About

Published in

Inside League

·Dec 26, 2021

Streaming Data to BigQuery with Dataflow and Updating the Schema in Real-Time

In our previous story, we saw how to stream data to Big Query and also add new columns when needed. This solution though is not really real-time, I think we can do better. Another approach I’ve seen discussed online, but haven’t found any code samples, is this. We enable streaming…

Dataflow

3 min read

Streaming Data to BigQuery with Dataflow and Updating the Schema in Real-Time
Streaming Data to BigQuery with Dataflow and Updating the Schema in Real-Time
Dataflow

3 min read


Jun 4, 2021

Authenticated calls to cloud functions with Python

The past few weeks I developed and deployed a cloud function that is supposed to get called only by authorized users/service accounts and the truth is that the documentation I found wasn’t really helpful. — First I created a service account, gave it roles/cloudfunctions.invoker permission. If you’re dealing with an extracted service account the code is pretty simple from google.oauth2 import service_account from google.auth.transport.requests import AuthorizedSession

Cloud Functions

3 min read

Authenticated calls to cloud functions with Python
Authenticated calls to cloud functions with Python
Cloud Functions

3 min read


Published in

Inside League

·Jan 17, 2021

Loading complex JSON files in RealTime to BigQuery from PubSub using Dataflow and updating the schema

In my previous post, I explained how to stream data from Salesforce to PubSub in real-time. The next logical step would be to store the data somewhere, right? One option could be, for example, to batch the data, and write them to files in GCS. That’s a good start and…

Pub Sub

6 min read

Loading complex JSON files in RealTime to BigQuery from PubSub using Dataflow and updating the…
Loading complex JSON files in RealTime to BigQuery from PubSub using Dataflow and updating the…
Pub Sub

6 min read


Published in

Inside League

·Dec 12, 2020

Real-Time Streaming Salesforce Updates to PubSub

TL/DR The working implementation is on Third Attempt Intro A couple of months ago when we started building our data lake, one of the requirements was to try and get real-time data in. …

Engineering

11 min read

Real-Time Streaming Salesforce Updates to PubSub
Real-Time Streaming Salesforce Updates to PubSub
Engineering

11 min read


Published in

The Startup

·Aug 29, 2020

Pyspark: How to Modify a Nested Struct Field

In our adventures trying to build a data lake, we are using dynamically generated spark cluster to ingest some data from MongoDB, our production database, to BigQuery. …

Pyspark

3 min read

Pyspark: How to Modify a Nested Struct Field
Pyspark: How to Modify a Nested Struct Field
Pyspark

3 min read


Jul 4, 2020

Schedule Dataflow Templates with Airflow

Ok, so, we’ve written our Dataflow Template with Python, now what? We want to schedule it to run daily and we’re going to use Airflow for that. The first thing we want, for security reasons, is to keep service accounts separate. In the previous post, we’ve created a service account…

Airflow

3 min read

Schedule Dataflow Templates with Airflow
Schedule Dataflow Templates with Airflow
Airflow

3 min read


Published in

Analytics Vidhya

·May 31, 2020

Transform JSON to CSV from Google bucket using a Dataflow Python pipeline

In this article, we will try to transform a JSON file into a CSV file using dataflow and python First, we’ll need a service account, give it the “Dataflow Worker” role and don’t forget to export it as a JSON at the end so we can use it later.

Dataflow

4 min read

Transform JSON to CSV from Google bucket using a Dataflow Python pipeline
Transform JSON to CSV from Google bucket using a Dataflow Python pipeline
Dataflow

4 min read

Alex Fragotsis

Alex Fragotsis

131 Followers

Data Engineer @ League Inc.

Following
  • Elye

    Elye

  • Darius Foroux

    Darius Foroux

  • Bennett Garner

    Bennett Garner

  • Tim Denning

    Tim Denning

  • Stephen Moore

    Stephen Moore

See all (136)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech

Teams