Our client is a Berlin-based, remote-first scale-up providing cuttingβedge market intelligence and software solutions to the automotive industry. As the company enters an exciting new phase of growth, they are looking for an experienced Scraping Infrastructure Engineer to strengthen their international, highβimpact team.
If you thrive on architecting and maintaining highly resilient, large-scale scraping systems capable of handling sophisticated anti-bot and blocking mechanisms, this role is for you. You will be responsible for the entire lifecycle of our high-volume scraping pipelines, focusing on the infrastructure, tooling, and strategic defenses that guarantee accurate, consistent, and high-speed data collection at scale.
Responsibilities
- Infrastructure Strategy & Architecture: Architect, build, and maintain the core infrastructure for massive, large-scale asynchronous data extraction system.
- Advanced Resilience Engineering: Design, implement, and continuously optimize sophisticated anti-blocking strategies, IP rotation, fingerprint management, and anti-bot bypass techniques to ensure high reliability and consistent uptime against modern web blocking.
- Operational Excellence & Monitoring: Implement robust monitoring, alerting, and logging systems to proactively debug, troubleshoot, and continuously improve scraper performance, reliability, and data quality across the platform.
- Core Development: Develop, test, and deploy highly robust and fault-tolerant web scraping components using advanced Python tools (Scrapy, Playwright, Selenium, Requests, etc.).
- Integration & Pipelines: Manage and automate high-volume data ingestion pipelines and seamless integrations with internal and external REST APIs.
- DevOps & Automation: Drive DevOps best practices, including managing infrastructure with Docker, Nomad knowledge (a plus), CI/CD pipelines
- Collaboration & Mentorship: Partner with other engineers to set standards, enhance core infrastructure tooling, and mentor junior team members.
Requirements
- Core Experience: Proven, hands-on professional experience in high-volume web scraping and data extraction using Python.
- Anti-Blocking Expertise: Deep, practical knowledge of anti-bot solutions, including CAPTCHA solving, browser fingerprinting, and effective proxy/IP management strategies.
- Technical Depth: Solid understanding of HTML parsing, browser automation techniques, and asynchronous programming.
- Frameworks: Proficiency with leading web scraping frameworks (e.g., Playwright, Scrapy, or Selenium).
- Web Knowledge: Strong knowledge of REST APIs, HTTP protocols, and effective proxy management.
- Database Skills: Familiarity with both SQL and NoSQL databases for efficient data storage and processing.
- Infrastructure: Experience with Docker, Linux environments, and version control (Git).
- Communication: Fluent in English (written and spoken).
- Mindset: Self-driven, pragmatic, and capable of taking full ownership of critical, high-impact infrastructure projects.
Nice to Haves (Bonus Points)
- Experience with advanced async libraries (e.g., asyncio)
- Understanding of data quality validation and pipeline monitoring tools.
What they offer
- Impact & Ownership: A high degree of freedom and the opportunity to have a meaningful, measurable impact on a growing scale-up business.
- Flexibility: A high degree of flexibility β our client is a remote-first company and actively support remote work.
- Growth: A competitive compensation package and dedicated support for your personal & professional development (ongoing training & coaching).
- Team & Atmosphere: A great work atmosphere within a small, talented, and international team.
- Office (Optional): A modern office located on the campus of Wildau Tech University, easily accessible by public transport (just outside Berlin).