Shivam Dixit
Data Engineer, experienced in scraping websites, building API's and designing CI/CD
pipelines.
👉 shivam.r.dixit@gmail.com
Tech stack
Python
GEN AI - OpenAI, AnthropicAI
AI Agent - CrewAI
N8N
AWS
Scrapy, Selenium & Playwright
PostgreSQL
Work History
🚧 Data Engineer | Forage.ai
10/2022 - Present
Worked on exciting projects including collecting data from various sources using the
Scrapy framework, building LLM-based automation tool to generate entities. Last but not
the least, developing APIs using our own in-house Flask framework.
- Designed a scalable solution using scrapyd to run & control the spiders hosted on AWS EC2.
- Building ETL pipeline to extract the data from the source, transform it and load into database.
- Designing robust CI pipeline from scratch which has been running without failure in the last 1 year.
- Built an LLM-powered application using OpenAI (GPT-4/GPT-3.5) and Claude API to extract domain-relevant
entities from web pages and generate industry-agnostic schemas.
🚧 Data Scientist | HolidayME
10/2021 - 10/2022
collecting data from different travel websites to provide better deals to customers.
- Developed custom web scraping scripts using tools like Scrapy, or Selenium to meet specific
project requirements.
- Conducted data cleaning and preprocessing to ensure accuracy and consistency of scraped data
- Implemented proxy and user-agent rotation to prevent IP blocking and improve data collection
quality
Accomplishments & Awards
🏆 Reduced CI pipeline infrastructure cost from $300 to $0, achieving significant
cost savings for the organization.
🏆 Transformed the organization's ticket system by automating Notion, doubling
efficiency and streamlining workflows.