Skip to content
View ChuquEmeka's full-sized avatar
😃
Contentment is a gift
😃
Contentment is a gift

Block or report ChuquEmeka

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ChuquEmeka/README.md

Hi, I'm Edeh Emeka N.

Data and Platform Engineer focused on building reliable and scalable data platforms on AWS and GCP. I care a lot about automation and infrastructure as code.

I enjoy designing systems from start to finish - real-time streaming, batch analytics, data transformation, orchestration, and analytics. Before moving into data engineering, I spent six years in real estate, which helps me bring real business understanding to technical problems.


Enterprise Data Platform

Right now I design and run a complete production-grade data platform on AWS. All the main pieces live together in my dedicated GitHub organization:

enterprise-data-platform-emeka

Main Repositories

These projects work together as one system:

View the full organization

Architecture

flowchart TD
    subgraph Source ["Source Layer"]
        direction TB
        Postgres[PostgreSQL RDS\nWAL Log] --> DMS[AWS DMS CDC]
        DMS --> S3Raw[S3 Bronze\nRaw CDC Parquet]
    end

    subgraph Processing ["Processing Layer"]
        direction TB
        Glue[Glue PySpark\nBronze to Silver] --> Silver[S3 Silver\nCleaned Parquet]
        Silver --> DBT[dbt + Athena\nSilver to Gold]
        DBT --> Gold[S3 Gold\nAggregated Parquet]
    end

    subgraph Serving ["Serving Layer"]
        direction TB
        Redshift[Redshift Serverless] --> BI[BI Dashboards]
    end

    subgraph Analytics ["Natural Language Analytics Agent"]
        direction TB
        NLQ[User NL Question] --> Agent[Analytics Agent\nECS Fargate + Claude API]
        Agent --> SchemaRes[Schema Resolver\nGlue Catalog + dbt artifacts]
        SchemaRes --> SQLGen[SQL Generator\nPartition-aware Athena SQL]
        SQLGen --> Guardrails[Guardrails\nSELECT-only, cost check]
        Guardrails --> Exec[Athena Execution]
        Exec --> Validate[Result Validator\nSanity checks]
        Validate --> Output[Chart + Insight + SQL\nAssumptions flagged]
    end

    S3Raw --> Glue
    MWAA[MWAA Airflow\nOrchestration] -->|triggers| Glue
    MWAA -->|triggers dbt| DBT
    Glue -.->|invalid records| Quarantine[S3 Quarantine]
    Gold --> Redshift
    Gold --> SchemaRes
    Exec -->|queries| Gold
    DBT -.->|uploads dbt artifacts| SchemaRes

    classDef layer fill:#f0f4f8,stroke:#333,stroke-width:2px;
    class Source,Processing,Serving,Analytics layer;
Loading


Featured Public Projects

I also have several public repositories that show my work across different tools and domains:

  • Databricks Asset Bundles + Real Estate Pipeline: End-to-end ELT on GCP with Delta Live Tables and medallion architecture
  • Real Estate Valuation Pipeline: Built with dbt Fusion, Snowflake, and AWS S3
  • Airflow + dbt + BigQuery Healthcare Pipeline: Full orchestration and transformation on Google Cloud
  • AWS Terraform Data Platform: Infrastructure as code for S3 data lake, Glue, Athena, and CI/CD
  • Fraud Detection and Sales Analytics Pipelines: Using dbt, Snowflake, and Tableau

These projects support what I do in my main enterprise platform and show how I apply modern data engineering in practice.


Skills and Tools

  • Pipelines and Processing: dbt, Apache Kafka, Databricks, Glue, Spark
  • Cloud and Infrastructure: AWS (S3, Glue, Athena, Redshift, IAM), Terraform, GCP
  • Orchestration: Apache Airflow (MWAA), GitHub Actions
  • Languages: Python, SQL
  • Visualization: Power BI, Tableau, Looker, QuickSight

Expertise

  • Building layered data platforms (raw, curated, and analytics layers)
  • Streaming and batch ELT workflows
  • Infrastructure as code with proper CI/CD
  • Automation and reliability at scale
  • Using domain knowledge to solve real business problems

Visit my YouTube channel to see project demos: @Data_Pipeline_Lab

YouTube LinkedIn

Pinned Loading

  1. Databricks_Asset_Bundles_Real_Estate_Data_Pipeline_Youtube Databricks_Asset_Bundles_Real_Estate_Data_Pipeline_Youtube Public

    Real Estate ELT pipeline using Databricks Asset Bundles on GCP. Ingests, transforms, and analyzes property data via Delta Live Tables. Follows medallion architecture (Bronze/Silver/Gold), modular P…

    Python 2 1

  2. real_estate_valuation_dbt_fusion_snowflake_aws_pipeline real_estate_valuation_dbt_fusion_snowflake_aws_pipeline Public

    This project implements an end-to-end real estate valuation data pipeline leveraging Snowflake as the data warehouse, AWS S3 as the data lake storage, and dbt Fusion (version 2.0.0-beta.13) for tra…

    Python 1

  3. DBT-Fraud-Detection-Data-Pipeline DBT-Fraud-Detection-Data-Pipeline Public

    This Fraud Detection Data Pipeline project processes transaction data from AWS S3 to Snowflake, transforming it with dbt and automating deployment with GitHub Actions. It includes a Power BI dashbo…

    Python 3 2

  4. Airflow-dbt-bigquery-gcs-healthcare-data-pipeline Airflow-dbt-bigquery-gcs-healthcare-data-pipeline Public

    This project demonstrates an end-to-end healthcare data pipeline using Apache Airflow for orchestration, dbt for transformations, and Google BigQuery/GCS for data storage and querying. It automates…

    Python 6

  5. End-to-End-Data-Pipeline-Snowflake-dbt-Tableau End-to-End-Data-Pipeline-Snowflake-dbt-Tableau Public

    End-to-End Data Pipeline for Sales Analysis: This project showcases a data pipeline using Snowflake, dbt, and Tableau to transform raw sales data into structured insights. It employs incremental da…

    6 1

  6. aws-data-infra-terraform aws-data-infra-terraform Public

    Terraform-managed AWS data platform with S3 data lake, Glue ETL, Athena, and environment-aware CI/CD

    HCL