Data Fetching Abstraction Layer for Facebook's GraphQL

Efficient data-fetching, caching and authorization framework implemented in Node.js with Facebook's Flow Types for fetching data to serve a GraphQL API. Uses Facebook's DataLoader library and designed to with the GraphQL-js reference implementation.

Supports declarative model definitions that auto-generate fetcher methods and authorization checks. Fetchers are backed using DataLoader caches and auto-generate datastore queries depending on parameters passed into the declarative configurations if not inferred from sensible defaults.

Being open-sourced

Core Data Pipeline for Waldo Photos

Distributed high-throughput, low-latency pipeline for ingesting, processing and performing facial recognition and comparison on digital images.

Uses libvips for image processing. Apache Kafka for moving data through service processes. Uses several CNN/RNN's running inside the Caffe framework and trained on open source datasets for inference and extraction of face descriptors. Has a lightweight stream processing framework for Kafka written in Clojure for managing data streams from different topics and combining them to produce data transformations, enrichments and make business decisions.

Closed-source, proprietary.

Lightweight Stream Processor Manager for Apache Kafka

Implemented using Clojure core.async, the Franzy library and the Apache Kafka Java API. Automates the registration of topic processors and supports integration with the Confluent Schema Registry.

Supports the automatic de-coupling of consumption and processing workers. Communicates via success/failure channels to asynchronously and efficiently manage job state. Supports full topic replay on-demand. Supports partial topic replay on-demand using a fixed window (across partitions). Also supports a fast-forward mode for resetting offset commit state safely as to not incur side-effects from reprocessed messages on a long-retention topic.

Being open-sourced

Postgres -> Kafka Change Data Capture Tool

Hi-jacks Postgres LISTEN/NOTIFY capabilities to provide a reasonably (not unlimited) scalable solution for durable, fault-tolerant replication of Postgres table mutations onto Kafka topics for purposes of change-data-capture.

Implemented using the psycopg2 driver for Postgres communication and librdkafka for Apache Kafka interface. Uses a SQLAlchemy based Python CLI tool for installing generic DDL instrumentation into the datastore for one-time setup.

Being open-sourced

AWS SQS -> Kafka Piping & Transformation

Generic tool implemented in Golang for subscribing to an arbitrary set of AWS SQS queues and replicating data from the queues on to an arbitrary set of Kafka topics. Optionally, you can specify a transformation function that will convert the SQS payload to to conform to whatever schema is used for the Kafka topic.

Implemented in Golang and makes good use of go-routine concurrency patterns.

Being open-sourced