Rohan  Kumar Dubey

I am a Senior Big Data Engineer at PokerStars, where I provide in-house solutions and services to support real-time data needs and activities in the online gaming ecosystem. I leverage my skills in Java, Python, Apache Flink, Apache Spark, AWS, and other big data technologies to create and maintain scalable and reliable data pipelines, architectures, and platforms. With over six years of experience in software development and data engineering, I have worked on various SAAS products and data-driven projects for leading companies such as Experian, Model N, Carelon Global Solutions, and Legato Health Technologies. I have a Bachelor of Technology degree in Electrical and Electronics Engineering from Mahatma Gandhi Institute of Technology. I am passionate about solving problems faced while integrating large-scale distributed systems, concurrent programming, and designing low-latency systems. My goal is to deliver innovative and impactful solutions that enhance data quality, performance, and insights. I am an open-source enthusiast and contributor.


Skills

Programming Languages & Tools & Libraries
Core Skills
  • Big Data Frameworks: Apache Spark, Apache Hive, Apache Kafka, Apache Hadoop, Apache NiFi, Nifi Registry, AWS, Delta Lake, Apache HBase, Apache Pig, Apache Sqoop, Apache Oozie, Apache Flume, Docker, Kubernetes, Dremio, Teradata
  • Programming Languages: JAVA, PYTHON, SCALA, GO, SQL, SHELL SCRIPT, C++.
  • Pipeline development and deployment (CI/CD)
  • Designing an ETL pipeline using Informatica
  • Object Oriented Programming (OOPS)
  • Open source contribution
  • Business Intelligence Reporting
  • Web scraping and web automation

Experience

Senior Big Data Engineer

Poker Stars
www.pokerstars.com

As member of the RAPID team, which is in charge of providing in-house solutions and services to other departments in support of their real-time data needs and responsible for gathering information from processing and monitoring real-time activity in the Stars Group Poker game ecosystem via real-time big data technology for Business Events architecture, Customer Profiles, EventGens, Personalised Poker Recommenders, and Enhanced Data Science capabilities.

TECH STACK: Java, Python, Apache Flink, Apache Spark, Amazon Web Services, AWS EC2, AWS DynamoDB, Amazon Kinesis Data Analytics, Amazon Kinesis, Apache Kafka, Apache Hive, Hadoop, Hadoop Yarn, Delta Lake, Apache Iceberg, Big Data, ETL, REST APIs, PostgreSQL, Jenkins, Machine Learning, AWS Athena, Kubernetes, Prometheus, Spring Boot, Grafana, Docker AWS lambda, Data Warehousing, Git, Maven.

October 2023 - Present

Senior Software Developer

Experian PLC
www.experian.com

Worked on a Large Scale SAAS product as a Backend and data Engineer in the Ascend Intelligence Service (AIS) Team. Ascend Intelligence Service(AIS) is a Global Analytics Product line intended to support the end-to-end modeling needs of Clients from loan originations to collections. The AIS web application consists of an Online interactive Platform powered by Spring Boot, Apache Spark, AWS EC2, and Kubernetes. Python, Java, Scala, AKKA framework, Jupyter Notebook (instances deployed on Kubernetes), and SparkSQL. which can be used to develop a range of machine learning models, produce SQL reports, and tables, and share those through an online application where clients can log into and view the Dashboards in UI. Ascend Intelligence Services™ (AIS) uses Big data and AI/ML to help you make the most informed and profitable decisions.

TECH STACK: Java, Scala, Python, Apache Spark, Amazon Web Services, AWS EC2, Apache Kafka, Apache Hive, Apache Flink, Hadoop, Apache Spark Streaming, Delta Lake, Apache Iceberg, Airbyte, Big Data, ETL, REST APIs, PostgreSQL, Jenkins, Machine Learning, AWS Athena, Kubernetes, Prometheus, Grafana, Docker AWS lambda, Data Warehousing, Git, Maven.

April 2023 - September 2023

Member of Technical Staff

Model N Software Pvt. Ltd.
www.modeln.com

Worked on the SAAS Product Development team(Integration Service). Which enables communication between different entities within the ecosystem using the REST API calls (Spring Boot). This will enable the data movement across different platforms using the event streaming platform (Apache Kafka) and processing the obtained data using the Data processing framework (Apache Spark) through which ACID transactions on delta lakes (Delta.io) can be possible and utilizing live and interactive queries on finally processed data using Data lake engine (Dremio).

TECH STACK: Java 11, Scala, Python, Apache Spark, Apache Kafka, Apache Hadoop, Apache Hive, Apache NIFI, Spring Boot, Nifi-Registry, Nifi-Tool-Kit, AWS S3, AWS EMR, AWS GLUE, AWS Athena, AWS lambda, AWS Secret Manager, Amazon Redshift, Amazon RDS, AWS Athena, AWS Lambda, Prometheus, Delta Lake, PostgreSQL, Dremio, Jenkins, ETL, Temporal, Docker, Kubernetes, Flyway, Maven, Fabric8, Git, Github.

July 2021 - March 2023

Software Engineer

Carelon Global Solutions LLP
www.carelonglobal.com

Worked on Medicare/Medicaid initiative to have clinical intervention data, core administrative data(membership, claims, provider data), CMS-originated data from the different data Marts to aggregate and perform insights so that they can maximize Medicare revenue and reduce Medicare administrative costs.

TECH STACK: Scala, Java 8, Python, Shell Script, SQL, Apache Spark, Apache Hadoop, Apache Hive, Apache HBase, Apache Ranger, Apache NiFi, Amazon Web Services (AWS), AWS S3, AWS EMR, AWS GLUE, AWS Secret Manager, AWS Athena, AWS lambda, Amazon Redshift, Amazon RDS, Informatica, Teradata, Control-M, Bitbucket.

July 2020 - June 2021

Associate Software Engineer

Legato Health Technologies LLP
www.legatohealth.com

Designed an end-to-end data processing framework where RAW data is processed to obtain the desired format and Published the formatted data to Dashboard using Reporting Tool Tableau. Performed ETL(Extract Transform Load) using Teradata and Extract Transform Tool (Informatica) for Reducing the MEDICARE Administration costs.

TECH STACK: Amazon Web Services (AWS), Big Data, Python (Programming Language), Apache Spark, Apache Hadoop, Apache Hive, Apache Oozie, AWS S3, AWS EMR, AWS GLUE, AWS Secret Manager, AWS Athena, AWS lambda, Amazon Redshift, Amazon RDS, PostgreSQL, Apache Pig, Apache NiFi, Apache Sqoop, Teradata, Shell Scripting, SQL, Informatica, Control-M, Bitbucket.

September 2018 - June 2020

Education

MAHATMA GANDHI INSTITUTE OF TECHNOLOGY, Hyderabad

bachelor of technology (Electrical and Electronics Engineering)

2014 - 2018

NARAYANA JUNIOR COLLEGE, Hyderabad

INTERMEDIATE (MATHS,PHYSICS,CHEMISTRY)

2012-2014

SRI KRISHNAVENI TALENT SCHOOL, Hyderabad

Secondary School Certificate

2011-2012


Projects

ATM Software

Atm software for transactions in file storage system.

Go To Project

C++ File Handling Exe VS Code Optimised

Xlsx To Csv Converter

Java Code Used To Convert The Raw Xlsx File to Csv File.

Go To Project

Java Xlsx Csv Conversion

PASSWORD MANAGER

Manage passwords using inbuild sqlite database using single master password.

Go To Project

Password Python Master SqlLite Dictonary

SNAKE GAME USING AI

A math based snake game, additional to the classical snake game.

Go To Project

AI JavaScript search algorithm training math

RETRO PING PONG GAME

Retro Ping Pong game using Turtle module.

Go To Project

turtle Python Module 2D

Automation

Automate the task fetching script names from Xlsx file.

Go To Project

Server Python validation Paramiko Teradata

BIT COIN MINING SOFTWARE

bitcoin mining using previos hash value sha256.

Go To Project

BitCoin Python Hash sha256 GPU

CONVERT HTM TO TXT

Extract the html page and convert into the text format.

Go To Project

pandas Python os SQLA path

GITHUB AUTOFOLLOW BOT

Github auto follow bot to increase or decrease following.

Go To Project

selenium Python getpass time sys

IMAGE RESIZER

Used to resize the image to our needs using slider.

Go To Project

tkinter Python PIL shutil ImageTk

PDF TO TEXT CONVERTER

Converts the content present in pdf to txt.

Go To Project

PyPDF2 Python PdfFileReader

SNAKE GAME(PYTHON)

Snake game using Tkinter and pygame module.

Go To Project

pygame Python tkinter messagebox

PASSWORD SAVER

Saves the password in hashed format and unhashes it.

Go To Project

tkinter Python functools operator Pmw

FLASK REST API

Creating an rest api using flask.

Go To Project

Flask Python route jsonify

COMPRESSION ALGORITHM

Effective way to compress the data and uncompress when required.

Go To Project

zlib Python base64 os

AMAZON PRODUCT MONITOR

Monitors the specific product and when the price falls below mails to buy the product.

Go To Project

requests Python smtplib BeautifulSoup re


ACHIEVEMENTS & ACCOLADES

  • Received Award from manager for the automation which reduced manual hours.
  • Received appreciation from onsite counterparts for the best deliverables in time.
  • Received Spot Award.
  • Got recognition for creating file validation scripts which checks the issues in files which are being processed.
  • Developed POC for Migrating data from Teradata to snowflake using AWS lambda function.
  • Developed Database objects validation software used to validate the table structures among different environments and send mails to respective teams if there exist any difference.
  • Auto fetcher software written in python to extract the scripts from the servers and compare for any issues in higher (PROD) and lower (DEV) environment and send mail automatically if any issue exists.
  • Database workbook validation script to check for errors in the workbook.
  • Created Reusable informatica workflow using Visio.

Built with by Rohan Kumar, under MIT License.