It’s a great position with a super smart bunch of guys. Centrally located right in SOMA (2nd and Howard) and we have an AWESOME office (it’s 102 years old !)
* Maintain our current crawler.
* Monitor and implement statistics behind the current crawler to detect anomalies.
* Implement new features for customers
* Work on backend architecture to improve performance and stability.
* Implement custom protocol extension for enhanced metadata and site specific social media support.
* Work on new products and features using large datasets.
Requirements and Experience:
* Deep understanding of Java (Threads, IO, tuning, etc)
* Internet standards (HTTP, HTML, RSS, DNS, etc)
* Basic understanding of distributed systems (load balancers, job control, batch
processsing, TCP, etc).
* Version control (preferably hg or git)
* Comfortable in a UNIX environment (ssh, bash, file manipulation, etc)
Debian, Python, Linux kernel, MySQL, Crawler Design.
- Ergotron Desk Laptop Mount
- Java and In-Memory Data Structures - A Flat File Map Proposal
- Visualizing Linux IO with Seekwatcher and blktrace
- Fast Caffeine Metabolism for Better Sleep with Sulforaphane
- Memcached or when 70% is Full
- iTunes Doesn't Support HTTP 206 - Partial Content
- Changing Linux Mount Options at Runtime (noatime)
- Google, Bigtable, Compression, Zippy and BMDiff
- Super Capacitor and Flash Backed RAID controllers
- linux-ftools - fallocate, fadvise, fincore from the command line.