Shivam Dixit

Data Engineer, experienced in scraping websites, building API's and designing CI/CD pipelines.

👉 shivam.r.dixit@gmail.com

Skills & Qualifications

Tech stack

Python
GEN AI - OpenAI, AnthropicAI
AI Agent - CrewAI
N8N
AWS
Scrapy, Selenium & Playwright
PostgreSQL

Work History

🚧 Data Engineer | Forage.ai

10/2022 - Present

Worked on exciting projects including collecting data from various sources using the Scrapy framework, building LLM-based automation tool to generate entities. Last but not the least, developing APIs using our own in-house Flask framework.

  • Designed a scalable solution using scrapyd to run & control the spiders hosted on AWS EC2.
  • Building ETL pipeline to extract the data from the source, transform it and load into database.
  • Designing robust CI pipeline from scratch which has been running without failure in the last 1 year.
  • Built an LLM-powered application using OpenAI (GPT-4/GPT-3.5) and Claude API to extract domain-relevant entities from web pages and generate industry-agnostic schemas.
🚧 Data Scientist | HolidayME

10/2021 - 10/2022

collecting data from different travel websites to provide better deals to customers.

  • Developed custom web scraping scripts using tools like Scrapy, or Selenium to meet specific project requirements.
  • Conducted data cleaning and preprocessing to ensure accuracy and consistency of scraped data
  • Implemented proxy and user-agent rotation to prevent IP blocking and improve data collection quality

Accomplishments & Awards

🏆 Reduced CI pipeline infrastructure cost from $300 to $0, achieving significant cost savings for the organization.

🏆 Transformed the organization's ticket system by automating Notion, doubling efficiency and streamlining workflows.