Rafid Rahman profile photo

Hello, I'm

Rafid Rahman

Data Engineer

Rafid Rahman LinkedIn profile Rafid Rahman Github profile

Get To Know More

About Me

Snapshot icon

Snapshot

  • B.S. Computer Science @ The City College of New York (May 2026)
  • S Jay Levy Fellow
  • Databricks Fundamentals (Certification)

Hi, I’m Rafid! I’m a Computer Science student at The City College of New York graduating in May 2026, and I’m especially interested in data. My primary focus is data engineering because I enjoy building the pipelines, infrastructure, and systems that make data usable in the first place. I like the challenge of turning messy, unstructured information into something clean, reliable, and ready for analysis.

Alongside that, I’m interested in data science work because I enjoy uncovering insights and understanding how data can drive better decisions. I also explore software engineering, since building reliable tools and applications ties everything together and lets those insights actually make an impact.

Outside of tech, I’m usually planning my next trip or taking photos of wherever I end up. I’m really into travel and photography. I keep myself active through sports too. And food? Yeah… I’m a full-on foodie 😅. I love trying different cuisines and dragging friends along for the adventure.

Browse My

Tech Stack

Python Python
SQL SQL
JavaScript JavaScript
C C
C++ C++
AWS AWS
Docker Docker
GitHub GitHub
React React
Apache Spark Spark
PySpark PySpark
Apache Airflow Airflow
Databricks Databricks
dbt dbt
Snowflake Snowflake
DuckDB DuckDB
TensorFlow TensorFlow
PyTorch PyTorch
scikit-learn scikit-learn
pandas pandas
NumPy NumPy
Power BI Power BI
Looker Studio Looker Studio
PostgreSQL PostgreSQL
SQLite SQLite
Apache Parquet Parquet
Jupyter Notebook Jupyter Notebook
Python Python
SQL SQL
JavaScript JavaScript
C C
C++ C++
AWS AWS
Docker Docker
GitHub GitHub
React React
Apache Spark Spark
PySpark PySpark
Apache Airflow Airflow
Databricks Databricks
dbt dbt
Snowflake Snowflake
DuckDB DuckDB
TensorFlow TensorFlow
PyTorch PyTorch
scikit-learn scikit-learn
pandas pandas
NumPy NumPy
Power BI Power BI
Looker Studio Looker Studio
PostgreSQL PostgreSQL
SQLite SQLite
Apache Parquet Parquet
Jupyter Notebook Jupyter Notebook

Explore My

Experience

Data Engineer Intern

Muslim American Society · New York, NY

Jul 2025 – Present

  • Automated ingestion of 1M+ property records with Python ETL, reducing manual prep from hours to minutes.
  • Improved data consistency across 50+ programs using SQL validation rules and quality checks.
  • Optimized relational schemas and query patterns to accelerate analytics for operational decision-making.

Data Engineer Intern

HapticNav · New York, NY

May 2024 – Aug 2024

  • Modeled telemetry and feedback data into analytical tables to support UX experimentation and feature evaluation.
  • Designed survey instrumentation and workflows for weekly product evaluation feedback.
  • Built dashboards and analytic summaries informing 5+ feature improvements.

Data Analyst Intern

ENSPIRE Magazine · New York, NY

Jan 2022 – Aug 2022

  • Built Python pipelines to ingest, enrich, and standardize publicist contact data for editorial operations.
  • Refactored 5+ years of content performance logs into structured datasets for exploratory analysis.
  • Modeled operational feedback data to analyze equipment usage and inform resource planning.

Browse My Recent

Projects

Airline Analytics Lakehouse

Built a medallion-style ETL pipeline on Databricks and AWS for 120M+ FAA flight records to support analytics and ML.

  • Built bronze/silver/gold pipelines with PySpark, Databricks, and Delta Lake for 120M+ records.
  • Engineered an AWS S3 lakehouse with IAM governance and Unity Catalog for 1+ TB of data.
  • Modeled fact and dimension tables for delays, performance, and seasonality.

PySpark · Databricks · Delta Lake · AWS

BMW Market Pulse

Automated an Airflow + Docker ELT pipeline to refresh 10,000+ marketplace records with validation and quarantine.

  • Built Airflow + Docker ELT to ingest and refresh 10,000+ marketplace records.
  • Implemented validation and quarantine rules for 100% clean datasets and 0 downstream failures.
  • Modeled Snowflake tables with dbt and Parquet, producing 650+ aggregates across 24 models and 25 model-years.

Airflow · Docker · Snowflake · dbt

NYC Traffic Collision Severity Prediction

Built ML-ready datasets from 2.19M NYC collisions with feature engineering to support severity prediction.

  • Cleaned and transformed 2.19M records with temporal and categorical feature engineering.
  • Addressed missing data, class imbalance, and leakage risks for reliable evaluation.
  • Achieved an F1 of 55% on severe outcomes.

Python · pandas · scikit-learn

MTA Ridership Analysis

Analyzed 107,974 ridership rows and built dashboards comparing MetroCard and OMNY usage.

  • Cleaned and summarized 107,974 rows into time-series and payment-method tables.
  • Built dashboards over 2.17M ride entries analyzing MetroCard (60%) vs OMNY (40%).

Python · BI

Get in Touch

Contact Me