Lead Infrastructure Engineer
- Interviewed 15+ DevOps candidates for technical and soft skills and recommended 3 for hire. Individually trained 8 team members on the inner workings of Reuters products, helping them become effective DevOps team members.
- Lead weekly stand-ups and provided daily guidance. As a result, the DevOps team has taken strides, accelerated delivery velocity, and covered previous gaps in infrastructure support.
- Used terraform to describe all of Reuters TV and data infrastructure. This allows us to track our AWS resources in GIT and document changes. The terraform manifests can be used to update existing infrastructure and clone it to other AWS regions. Additionally this approach allows us to maintain consistency across all of our environments
- Used Puppet to describe everything that is deployed on each server past its initialization from instance data. Hiera is used to manage secrets and configuration differences between various environments. This created a very simple flow of changes as they are promoted through various environments. Differences between infrastructures can be visualized simply by comparing the terraform and puppet manifests, and differences between configuration can be visualized by comparing the Hiera data.
- Developed an AMI Bakery that builds out instances from their respective Puppet manifests, validates that they are functional, de-personalizes them, snapshots them and promotes them in their auto-scaling groups. The baked images are fully functional when started even when Puppet is unavailable. This service significantly speeds up auto-scaling, improves resiliance and allows easy adoption of spot instances.
- Created a Jenkins Pipeline Library to standardize builds and deploys of over 30 Java (Maven and Gradle), NodeJS, Python, Golang, NodeJS and Docker projects. The builds execute within Docker containers allowing us to maintain separate sets of build dependencies and improve build isolation. Build artifacts are sent to a multi-region redundant artifact repository. Our Internal PKI infrastructure is used to facilitate client authentication, authorization, audit and MFA. Over 400 Jenkins jobs are maintained across all projects.
- Created a private APT repository using Reprepro and S3. Critical, custom and back ported packages are built, signed and prioritized for deployment via APT. The repository is automatically replicated across regional boundaries
- Assisted the data science team and created a content recommendation engine. The web based API was created on top of the Pyramid framework (Python) and has MySQL and Redis backends. The data is consumed from Vimond, Reuters, Segment and Chartbeat. The service processes 600 requests per second with the 90th percentile response time of 15ms, and a 5ms average.
- Created a serverless system to ingest Segment event data into a parquet based data lake on S3. The system is able automatically scale and ingest data at a predictable cost. The data can then be queried using tools like RedShift Spectrum, AWS Athena and Hive. After creating the system, I worked closely with the data team to migrate 9 trillion existing events and 20,000 lines of data pipeline code. The new system is projected to save us $300k in the first year.
- Utilized Route53 to create a public, private and reverse DNS system. This system is used for automatic server and service registration and discovery. Split horizon is used to maintain consistent naming between internal and external endpoints. This approach significantly speeds up development, system administration and debugging.
- Developed the ELB/ALB Cluster Keeper which adds and removes servers from Elastic and Application Load Balancers while ensuring cluster health, minimum capacity and proper connection draining. This allows us to perform rolling deploys without impacting the end user experience when services are restarted.
- Developed the Compound Health Check which is able to perform multiple health checks simultaneously and report the server status to the Elastic Load Balancers. Failures are reported to the development and operation teams via Slack. Repeated faults cause the application servers to be removed from the load balancer's pool. This tool allowed us to pack micro services onto EC2 instances and maintain service quality.
- Developed the DataDog Log Parser that can analyze application logs in real-time and extract various metrics. This allows us to keep a close eye on our median, mean and 90th percentile API response times and tag them based on various criteria. Unlike probe style monitors this approach allows us to aggregate metrics from each request that we process and greatly improves data depth and accuracy.
- Created extensive load testing scripts for Reuters TV APIs that simulate access from various client applications. The load tests were used to improve and maintain our API performance and resilience. This initiative alongside auto-scaling helped us ride out traffic spikes that were driven by breaking news and push notifications.
- Created a Varnish based caching layer between our various services and external endpoints. This cushions the impact of outages by allowing us to utilize stale data when fresh data is unavailable. Additionally, this significantly improves performance in regions which do not have local deployments of their upstream dependencies.
- Implemented request tracing through our services. A unique ID is generated on the load balancer and is then passed through our micro-services. This allows us to correlate logs across multiple services to a single end-user request.
- Used Slack and DataDog heavily to aggregate important system metrics, notifications and alerts. This not only allows our operation team to monitor our infrastructure from anywhere, but it also makes it possible to diagnose most problems without even needing a computer. Deployment, rollback, scaling and configuration tasks can also be performed while away from a computer using a mobile phone.
- Implemented a bi-directional incremental backups between on-prem and cloud infrastructures.
Sr. Enterprise Systems Architect
- Architected the re-launch of Sesame Street GO, adding the capability to connect multiple platforms such as iOS, Android, Web and Roku to a single back-end (API) and support for cross-platform entitlement (subscribe once, play anywhere). The paying subscriber base grew from 700 to 10,000 within eight months of the re-launch.
- Coordinated with multiple internal steak-holders in creating a technically detailed RFP. Vetted vendors for various components of the project such as the API server, content management system, payment platforms, and mobile app design.
- Evaluated the current content distribution process and created tools to batch ingest video, metadata, thumbnails and captions into Kaltura. This allowed us to more than double the amount of content available for the Sesame Street GO re-launch.
- Created a CMS for managing and deploying HTML5 and Flash games. The CMS was created with a serverless design utilizing AWS API gateway, Lambda, DynamoDB, S3 and CloudFront for serving. Atlassian Stash and Jenkins were used for versioning and deploying.
- Performed load testing on the API and evaluated the mobile app’s conformity with the API specifications prior to launch.
- Automated the configuration and deployment of the API server using Puppet, giving us the ability to have consistent development, test and production environments.
- Rewrote and improved the iOS, Android and Roku payment system integration code; Improving reliability, decreasing the number of new support cases, and causing an overall improvement in the App Store reviews.
- Evaluated cloud providers in context of our requirements, price and performance. Then created a detailed dynamic spreadsheet to estimate the price to fulfill our requirement at different service levels ie. standard vs reserved instances, different managed service vendors.
- Built puppet manifests and modules required to automate provisioning of eight types of server instances, and built ant scripts to facilitate one click builds.
- Fully migrated our entire web infrastructure to RackSpace under an aggressive deadline of thirty days. The new infrastructure design saves our company over $200k annually.
- Created a system which triggers event handlers to restart applications and servers during error conditions and then send out detailed status reports for each event allowing hands off operation.
- Set up a highly available internal DNS system using nictool for API and web administration, overcoming the lack of such offering by RackSpace and Amazon Web Services.
- Created a custom varnish configuration that speeds up delivery of assets and overcomes a major application deficiency. This allowed us to start deploying media heavy HTML5 games.
- Created scripts to perform one-click synchronization of content from our cloud based production environment to our in-house test and development environments.
- Deployed a companywide GIT server using Atlassian Stash, and then migrated all of out SVN repositories and deployment scripts.
- Developed a C# application to facilitate synchronization of HR data between PDS and ADP.
- Performed detailed benchmarks of different storage and network configurations and managed to triple the throughput of SMB transfers to 500MB/s paving the way for 4K video processing.
- Administered the Sesame Street website, which serves over 100,000 visitors per day
- Converted SesameStreet.org and microsites from Akamai to Amazon CloudFront, saving over $300,000 annually in video streaming costs
- Implemented a clustered Citrix XenServer environment with hands-free provisioning of Ubuntu VMs
- Created puppet scripts to automate provisioning and configuration of servers
- Scripted deployment of four microsites, virtually eliminating all deployment process errors
- Created and tested a caching configuration that reduced data center requests by 99%, allowing the Sesame Street website to handle the 500% traffic spike resulting from the site being centrally featured on the Google homepage
- Developed dynamic load testing scenarios in NeoLoad for AJAX calls, CMS downloads and user account management
- Applied over 600 hours of load testing results to recommend and develop site optimizations
- Planned, deployed and supported the department toolbox server consisting of JIRA for bug tracking, SVN for version control, Hudson for automated code deployment, MediaWiki for centralized documentation, Google Urchin for deep analytics and Crowd for centralized user management and single sign on capability
- Automated 20 processes using SQL, Perl and Bash, virtually eliminating human errors and saving 150 man hours per year
- Developed a C# GUI and Excel VBA controls to streamline the import of assets into company content management system
Co-Founder and Technical Director
- Co-founded a technology startup which specializes in transaction processing via text messaging and web applications
- Spearheaded IT functions, working with founders to generate and refine ideas, then developing software to implement them
- Developed a database-driven website with over 40,000 lines of ASP.NET and C# code and 5,000 lines of SQL code
- Wrote software that constructs GIS objects from census bureau ZCTA, TIGER LINE data and optimizes their lookup in SQL
- Pitched ideas to clients during formal and informal business meetings
- Developed a virtual instrument for performing spectroscopy experiments in LabView 8. This instrument synchronized 4 external and internal hardware components for signal generation, acquisition and provided central control
- Wrote a driver for Highland Technologies P400 Timing Module. Quality of the source code and documentation prompted Highland to adopt the driver, subsequently rewarding the lab with discounts on their hardware
- Designed and implemented a department-wide LAN using a high speed interface and shielded cabling, reducing interference and improving the overall experiment accuracy
- Developed a web-based technical support and ticketing system using PHP
Technical Support Specialist
- Provided one-on-one technical and computer support to over 100 faculty and staff
- Maintained 3 departmental Windows NT servers and one HP-UX server
- Developed Java and shell script tools to track down network problems and improve the office efficiency
- Assisted in installation of a 16-machine Linux computational cluster
Bachelor of Science
JIRA Workflows Training
JIRA Reporting and JQL Training
JIRA Administrators Training
JIRA Fundamentals Training
Advanced Google Search Appliance Training
Google Search Appliance Training
Liferay Core Product Training
A10 Load Balancer Training
- Developed the official Blondie and Dagwood website that was in use from 2006 to 2009
- Created graphics, games and dynamic content using Macromedia Flash, ActionScript, and Adobe Photoshop
- Developed a definite site design from loose ideas presented by the client
- Developed a multi-threaded text answering machine robot for AOL Instant Messenger, prior to others creating a such service
- Developed an API (using Java, MySQL and OOP) that forwards instant messages to users’ mobile phones via SMS
- Researched the AOL Instant Messenger protocol and developed a proprietary Java library that communicated via the protocol without use of preexisting commercial components
- Started a secure co-located Linux shell, MUD, web, and DNS hosting service for over 50 customers
- Developed a business website and a customer front-end using Java, JSP, MySQL, Perl and shell script that allowed users to change their mail settings, sub domains, processor/memory quotas, and sub-accounts. The developed control panel was the first of its type on the market, and was later sold to a business partner
- Created a website templating system using XML and XSLT, allowing web designers to modify the look and feel of the control panel without any prior knowledge of Java
- Installed and configured RedHat and Slackware Linux, Apache, Exim, Java Servlet Engine, PHP, MySQL and Bind
- Received a merchant account and developed a billing system to automate customer credit card billing
- Located and enrolled clients using various online marketing techniques