Data Analytics – Taicheng's Profile

Applications like foo.com generate real-time data (e.g., clickstream data) and batch data (e.g., logs). This data needs to be ingested, processed, and made ready for downstream systems in a data warehouse. From there it can be analyzed further by data analysts, data scientists, and ML engineers to gain insights and make predictions. You can ingest batch data from Cloud Storage or BigQuery and real-time data from the application using Pub/Sub, and scale to ingesting millions of events per second. Dataflow, based on open source Apache Beam, can then be used to process and enrich the batch and streaming data. If you are in the Hadoop ecosystem, you can use Dataproc for processing; it is a managed Hadoop and Spark platform that lets you focus on analysis instead of worrying about managing and standing up your Hadoop cluster.

To store the processed data, you need a data warehouse. BigQuery is a serverless data warehouse that supports SQL queries and can scale to petabytes of storage. It can also act as long-term storage and a data lake along with Cloud Storage. You can use data from BigQuery to create a dashboard in Looker and Data Studio. With BigQuery ML, you can create ML models and make predictions using standard SQL queries.