I'm Ashok

Data Engineer with 7+ Years of experience

Current Role

Bestbuy India Apr 2025 - Present

Software Engineer - II (Data)

Primarily focused on analyzing SAP reports and tracing data lineage from the reporting layer back through the medallion architecture—gold, silver, bronze layers—to the source RDBMS. The existing architecture is built on Teradata with ETL via Informatica. This analysis is used to reconstruct the data flow in Google Cloud using BigQuery and to build new reports in Looker Studio Pro.

Major Contributions:

  • Conducted in-depth lineage analysis of SAP reports across Teradata-based medallion architecture
  • Mapped data flow from gold layer to source systems to support migration to Google Cloud
  • Rebuilt data pipelines and models in BigQuery based on legacy architecture insights
  • Designed and developed reports in Looker Studio Pro for enhanced visualization and accessibility
SQL GCP Teradata Looker Studio Pro Medallion Architecture

Previous Experience

Carelon Global Solutions Dec 2022 - Mar 2025

Senior Software Engineer - Data Engineering

Focused on developing scalable data models and pipelines using Python and SQL across AWS and GCP environments to process high-volume healthcare claims data, while ensuring compliance with privacy and security standards.

  • Built ETL pipelines using Python and SQL on GCP (Cloud Functions & GKE) to monitor and process $250M in finalized claims daily (MBM Commercial model), orchestrated with Apache Airflow
  • Developed and maintained cloud-based workflows using AWS Lambda, Glue, EMR, Step Functions, and GCP services
  • Designed data models and contributed to architecture reviews with a focus on PHI/PII risk mitigation
  • Presented
  • Specialized in query optimization, code reviews, and lifecycle management to enhance data quality and security
AWS GCP SQL Airflow Snowflake

Legato Health Technologies Jan 2021 - Nov 2022

Software Engineer - Data Engineering

Contributed to enterprise-scale data pipeline development and deployment across cloud and big data platforms, with a focus on automation, data quality, and team mentorship.

  • Developed an HBase audit tool to detect missing loads between Hadoop and Snowflake for 1000+ tables
  • Built an AWS Glue job for incremental and bulk data transfers between S3 buckets, used daily for 1000+ tables
  • Led complex production deployments with strong quality control using Bitbucket and deployment trackers
  • Designed and optimized complex Snowflake SQL scripts for layered data warehousing
  • Mentored new associates and conducted training on Big Data, Hadoop, Bitbucket, and AEDL architecture
Python SQL Snowflake AWS Hadoop

Accenture Inc. Jun 2019 - Dec 2020

Software Engineer - Data Engineering

Led performance optimization and framework enhancement efforts for big data ingestion tools, supporting cross-team collaboration and large-scale ETL operations in a Hadoop ecosystem.

  • Improved Hive partition deletion tool, reducing runtime from 30 minutes to under 1 minute per table; awarded "Best of the Month" at Accenture BDF
  • Built a PySpark-based automation tool for converting Streamsets ETL pipelines to Spark jobs for 1800+ Hive tables
  • Enhanced ingestion frameworks using Shell Script, PySpark, Sqoop, Hive, and HBase to support historical and incremental data loads
  • Acted as the primary support for 8+ teams using the "Streamsets to Accelerator Converter" framework, ensuring reliable maintenance and issue resolution
Spark Hadoop Shell Script Python Hive Hbase UNIX

Accenture Inc. Jun 2018 - May 2019

Associate Software Engineer - Data Engineering

Worked on large-scale data ingestion and automation projects using Hadoop ecosystem tools, with a focus on streamlining data workflows and ensuring data quality.

  • Automated deletion of invalid partitions across 1800+ tables using shell scripting
  • Built 100+ Streamsets pipelines for efficient RDBMS to Hadoop data integration
  • Developed automation frameworks using test-driven development in Agile Environment
  • Gained deep expertise in Apache HBase and Hive data transformations
  • Created an auditing framework for 1600+ pipelines, reducing manual validation time by two days
Streamsets ETL SQL Hadoop Hive Hbase Shell Scripting UNIX

Accenture Inc. Sept 2017 - Jun 2018

Associate Software Engineer - Data Engineering

Completed comprehensive training in data engineering and business intelligence, focusing on data integration, processing, and visualization tools.

  • Trained in Hadoop ecosystem for big data processing
  • Gained proficiency in SQL and ETL ingestion techniques
  • Learned project management using Agile and Waterfall methodologies
  • Developed workflows using Informatica
  • Built interactive dashboards and reports in Power BI
SQL ETL Informatica Power BI Hadoop

Skills

Programming Languages

Python SQL SparkSQL Shell Script

Cloud Platforms

AWS GCP Snowflake

Big Data Technologies

Hadoop Spark Hive Hbase Kafka Airflow Nifi

BI & Visualization

Power BI Tableau Looker Studio Pro

DevOps & Tools

Terraform Git GitHub Bitbucket Bamboo CI/CD Jira Confluence Control-M

Database & Operating Systems

Oracle MySQL Teradata Mainframe Informatica Linux/Unix

Core Competencies

Data Engineering & Architecture

  • Data Modeling & Warehousing
  • ETL/ELT Pipeline Development
  • Data Quality & Governance
  • Security & Compliance

Cloud & Pipeline Engineering

  • Distributed Systems
  • Pipeline Orchestration
  • CI/CD Automation
  • Scalable Solutions

Data Operations & Management

  • Agile Methodologies
  • Project Management
  • Technical Documentation
  • Technical Training

Personal Projects

Data Lake Architecture

Designed and implemented a modern data lake architecture using AWS services. Built an end-to-end solution for data ingestion, processing, and analytics with automated governance and security controls.

AWS S3 AWS Glue Athena Python Terraform

Real-time Analytics Platform

Built a real-time analytics platform processing millions of events per day. Implemented stream processing, real-time aggregations, and interactive dashboards for monitoring key metrics.

Kafka Spark Streaming Redis Grafana Docker

ML Feature Store

Developed a centralized feature store for machine learning projects. Implemented feature computation, storage, and serving layers with support for both batch and real-time feature serving.

Python FastAPI Redis PostgreSQL MLflow

Data Quality Framework

Created an automated data quality framework with customizable rules engine. Implemented data profiling, validation, and monitoring with alerts and detailed reporting capabilities.

Great Expectations Airflow dbt Snowflake Slack API

IoT Data Pipeline

Built a scalable IoT data pipeline handling sensor data from thousands of devices. Implemented real-time processing, anomaly detection, and predictive maintenance capabilities.

AWS IoT Kinesis Lambda TimeStreamDB Python

Education

Anna University, Panimalar Engineering College, Chennai.Aug 2013 - Aug 2017

Bachelor of Engineering in Electrical and Electronics Engineering

Completed undergraduate studies with First Class, maintaining consistent academic performance throughout.

Final Year Project

DC-DC Converter for BLDC Motor Using Ultracapacitor & Battery

Designed and implemented a hybrid power supply system to improve efficiency and performance in Brushless DC motor applications.

Beyond Code

Hiking

Hiking and trekking the Himalayas. Love exploring new trails and challenging myself.

Basketball

Occasionally play basketball and enjoy staying active through the sport.

Geopolitics

Listen to a lot of geopolitical news and analysis.

Beyond my professional life, I'm an avid hiker, basketball player, and a keen follower of global affairs. I believe in maintaining a healthy work-life balance and continuously expanding my horizons through new experiences and challenges.

Let's Connect