FAQ

Corso Data Engineering on Google Cloud

Obiettivi | Certificazione | Contenuti | Tipologia | Prerequisiti | Durata e Frequenza | Docenti | Modalità di Iscrizione | Calendario

Google Professional Data Engineer Certification

Il Corso Data Engineering on Google Cloud fornisce ai partecipanti le competenze necessarie per progettare e costruire sistemi di elaborazione dei dati, analizzare i dati e implementare il machine learning utilizzando Google Cloud. Il corso si concentra su dati strutturati, non strutturati e in streaming e richiede una conoscenza di base in SQL, modellazione dei dati, attività ETL e linguaggi di programmazione come Python. È ideale per gli sviluppatori che si occupano di elaborazione dei dati, analisi e machine learning. Durante il corso, i partecipanti approfondiranno vari argomenti, tra cui il ruolo di un ingegnere dei dati, BigQuery, data lakes, data warehouses e la collaborazione con altri team di dati. Le tecnologie che saranno esplorate includono Google Cloud, SQL, Python, BigQuery, data lakes e data warehouses. Infine, il corso contribuisce alla preparazione per l’esame di Certificazione Google Professional Data Engineer.

Contattaci ora per ricevere tutti i dettagli e per richiedere, senza alcun impegno, di parlare direttamente con uno dei nostri Docenti (Clicca qui)
oppure chiamaci subito al nostro Numero Verde (800-177596).

Calling from abroad? Reach us at +39 02 87168254.

Obiettivi del corso

Di seguito una sintesi degli obiettivi principali del Corso Data Engineering on Google Cloud:

  • Imparare a utilizzare Google Cloud per la progettazione e costruzione di soluzioni di data engineering.
  • Esplorare le funzionalità avanzate di BigQuery per l’analisi di grandi dataset.
  • Applicare tecniche di data modeling e processi ETL all’interno dell’ecosistema Google Cloud.
  • Sviluppare competenze in machine learning con gli strumenti di Google Cloud.
  • Integrare servizi di data lake e data warehouse per l’archiviazione e l’analisi dei dati su Google Cloud.

Certificazione del corso

Esame Google Cloud Certified Professional Data Engineer; L’esame misura la capacità di progettare, costruire e gestire soluzioni di elaborazione dati sicure e scalabili su Google Cloud Platform. Testa conoscenze specialistiche in servizi come BigQuery per l’analisi di grandi dataset, Cloud Dataflow per la costruzione di pipeline di dati, e Cloud Dataproc per l’elaborazione di workload Hadoop/Spark. L’esame richiede anche competenze nella gestione di modelli di machine learning e nella scelta delle migliori strategie di storage e gestione dei dati. Candidati devono dimostrare l’uso efficace di Python e SQL per manipolare e analizzare i dati all’interno dell’ecosistema GCP.

Contenuti del corso

Data Engineering on Google Cloud Course Program

Introduction to Data Engineering on Google Cloud

  • Role and responsibilities of a Data Engineer
  • Data sources and data sinks
  • Structured, unstructured and streaming data
  • Common data formats: Avro, Parquet and JSON
  • Storage solution options on Google Cloud
  • Metadata management options on Google Cloud
  • Dataset sharing with Analytics Hub
  • Loading data into BigQuery using Google Cloud Console and gcloud CLI

Data Replication and Migration

  • Data replication and migration architecture on Google Cloud
  • Use cases for the gcloud command-line tool
  • Dataset movement strategies
  • Storage Transfer Service
  • Transfer Appliance
  • Datastream features and deployment scenarios
  • Data migration patterns for analytics environments

Extract and Load Data Pipeline Pattern

  • Extract and load architecture
  • Use of the bq command-line tool
  • BigQuery Data Transfer Service
  • Data ingestion into BigQuery
  • BigLake as a non-extract-load pattern
  • Integration between storage layers and analytical processing
  • Practical data loading scenarios on Google Cloud

Extract, Load and Transform Data Pipeline Pattern

  • ELT architecture on Google Cloud
  • Common ELT pipeline design patterns
  • SQL scripting with BigQuery
  • Scheduling capabilities in BigQuery
  • Workflow creation with Dataform
  • SQL-based transformation workflows
  • Data transformation directly inside the analytics platform

Extract, Transform and Load Data Pipeline Pattern

  • ETL architecture on Google Cloud
  • GUI tools for ETL data pipelines
  • Batch data processing with Dataproc
  • Dataproc Serverless for Apache Spark
  • Streaming data processing options
  • Role of Bigtable in data pipelines
  • ETL pipeline design for scalable data processing

Pipeline Automation Techniques

  • Automation patterns for data pipelines
  • Pipeline scheduling and orchestration
  • Cloud Scheduler
  • Workflows
  • Cloud Composer
  • Cloud Run Functions
  • Eventarc
  • Event-driven automation use cases for data processing

Modern Data Engineering on Google Cloud

  • Traditional data lakes and data warehouses
  • Modern data lakehouse architecture
  • Choosing the right data architecture
  • Comparison between data lake, data warehouse and lakehouse
  • Benefits of the lakehouse approach
  • Data architecture decision criteria for modern analytics platforms

Building a Data Lakehouse with Cloud Storage, Open Formats and BigQuery

  • Data lake foundation with Cloud Storage
  • Open table formats and Apache Iceberg
  • BigQuery as central processing engine
  • Operational data management with AlloyDB
  • Federated queries between operational and analytical data
  • Integration of Cloud Storage, BigQuery and AlloyDB
  • Real-world lakehouse implementation scenarios

Modernizing Data Warehouses with BigQuery and BigLake

  • BigQuery fundamentals
  • Scalable cloud data warehousing on Google Cloud
  • Partitioning and clustering in BigQuery
  • BigLake and external tables
  • Unified lakehouse architecture with BigLake and BigQuery
  • Querying external data
  • Native interaction with Apache Iceberg tables through BigLake

Advanced Lakehouse Patterns and Data Governance

  • Data governance in a unified data platform
  • Data security and sensitive data protection
  • Metadata management
  • Data Loss Prevention
  • Analytics on lakehouse data
  • Machine Learning on lakehouse data
  • Lakehouse migration strategies
  • Real-world lakehouse architecture patterns

Labs and Best Practices for Google Cloud Data Platform

  • Review of Google Cloud data platform core principles
  • Best practices for data engineering on Google Cloud
  • BigQuery ML
  • Vector Search with BigQuery
  • Analytics and Machine Learning integration
  • Practical reinforcement of data platform concepts

When to Choose Batch Data Pipelines

  • Batch data pipeline use cases
  • Role of the Data Engineer in batch pipeline development
  • Batch pipeline lifecycle from ingestion to downstream consumption
  • Data volume, data quality and processing complexity
  • Reliability challenges in batch processing
  • Google Cloud services for batch data pipelines
  • Batch processing architecture patterns

Design and Build Scalable Batch Data Pipelines

  • Batch pipeline design principles
  • High-volume data ingestion
  • Large-scale data transformations
  • Dataflow for batch processing
  • Serverless for Apache Spark
  • Data connections and orchestration
  • Apache Spark pipeline execution
  • Batch pipeline performance optimization
  • Throughput and cost-efficiency tuning

Control Data Quality in Batch Data Pipelines

  • Batch data validation
  • Data cleansing logic
  • Error logging and analysis
  • Schema evolution in batch pipelines
  • Data integrity management
  • Duplicate data handling
  • Deduplication with Serverless for Apache Spark
  • Deduplication with Dataflow
  • Data quality rules for large datasets

Orchestrate and Monitor Batch Data Pipelines

  • Batch pipeline orchestration
  • Workflow scheduling
  • Cloud Composer
  • Pipeline lineage tracking
  • Unified observability
  • Alerts and troubleshooting
  • Error handling strategies
  • Visual pipeline management with Cloud Data Fusion
  • Monitoring and operational control of batch pipelines

Streaming Data Pipelines on Google Cloud

  • Streaming data pipeline concepts
  • Challenges of streaming data processing
  • Role of streaming pipelines in data engineering
  • Real-time data ingestion
  • Streaming data processing scenarios
  • Hands-on learning scenario for streaming pipeline design

Streaming Use Cases and Reference Architectures

  • Introduction to streaming data pipelines on Google Cloud
  • Streaming ETL
  • Streaming AI/ML
  • Streaming applications
  • Reverse ETL
  • Streaming reference architectures
  • Use cases for real-time analytics and operational applications
  • Architecture patterns for event-driven data processing

Product Deep Dives for Streaming Pipelines

  • Messaging concepts for streaming architectures
  • Pub/Sub
  • Managed Service for Apache Kafka
  • Architectural considerations for Pub/Sub and Apache Kafka
  • Dataflow as streaming processing engine
  • Building and deploying streaming pipelines
  • BigQuery as analytical engine
  • BigQuery continuous queries
  • BigQuery ETL and Reverse ETL
  • Pub/Sub to BigQuery streaming configuration
  • Bigtable for operational data
  • Data movement from Dataflow to Bigtable
  • Trend analysis with BigQuery on Bigtable data
  • Synchronization of analytics results into user-facing applications

Key Takeaways and Next Steps

  • Review of the main Data Engineering concepts covered
  • Consolidation of Google Cloud data platform services
  • Review of batch and streaming pipeline patterns
  • Review of data lakehouse and data warehouse modernization concepts
  • Review of orchestration, monitoring and governance topics
  • Next steps for applying Data Engineering skills on Google Cloud

Tipologia

Corso di Formazione con Docente

Docenti

I docenti sono Istruttori accreditati Google Cloud e certificati in altre tecnologie IT, con anni di esperienza pratica nel settore e nella Formazione.

Infrastruttura laboratoriale

Per tutte le tipologie di erogazione, il Corsista può accedere alle attrezzature e ai sistemi presenti nei Nostri laboratori o direttamente presso i data center del Vendor o dei suoi provider autorizzati in modalità remota. Ogni partecipante dispone di un accesso per implementare le varie configurazioni avendo così un riscontro pratico e immediato della teoria affrontata. Ecco di seguito alcuni scenari tratti dalle attività laboratoriali:

Corso Data Engineering on Google Cloud

Dettagli del corso

Prerequisiti

Si consiglia la partecipazione al Corso Google Cloud Big Data and Machine Learning Fundamentals.

Durata del corso

  • Durata Intensiva 4gg;

Frequenza

Varie tipologie di Frequenza Estensiva ed Intensiva.

Date del corso

  • Corso Data Engineering on Google Cloud (Formula Intensiva) – Su richiesta – 09:00 – 17:00

Modalità di iscrizione

Le iscrizioni sono a numero chiuso per garantire ai tutti i partecipanti un servizio eccellente.
L’iscrizione avviene richiedendo di essere contattati dal seguente Link, o contattando la sede al numero verde 800-177596 o inviando una richiesta all’email [email protected].