Truxten Cook

Data Engineering @ Tatari.

Hi, I'm Truxten Cook, a Software Engineer based in San Francisco. I hold a MS and BS in Computer Science from Arizona State University, both with a 4.0 GPA. Currently, I work at Tatari on the Data Platform Team, where I design and implement solutions to optimize data infrastructure and machine learning operations. My work includes developing Pyspark applications, managing and working with Airflow, and creating an MLOPS Platform. Previously, I interned at Sandia National Laboratories, developing automated ML pipelines with PyTorch and MLFlow. I specialize in Python, Terraform, Databricks, Spark, MLFlow, and Airflow. Outside of work, I enjoy climbing, being outdoors, and strategy games.

Resume

Basics

Name Truxten Cook
Email [email protected]
Phone 6028108711

Education

  • Aug 2018 - Dec 2022

    Tempe, AZ

    MS Computer Science & Big Data Systems, BS
    Arizona State University
    Computer Science

    Work

    • May 2022 - Present

      San Francisco, California

      • Designed and authored a framework to automatically cut and deploy infrastructure for MLFlow machine learning models on Databricks. Collabarted with multiple Engineering and Data Science teams to ensure that stakeholders needs and ML-OPS best practices were followed. Final pipeline included automatic Terraform plans and applies through the CI, a bespoke infrastructure-as-code solution to dynamically shift the model deployed to best suite business needs, and a Python framework to accelerate deployment of machine learning models.
      • Spearheaded a large-scale optimization of S3 storage across hundreds of buckets, reducing reducing monthly costs by over $30,000 across hundreds of buckets and a multi-petabyte Data Lake. Designed and implemented data lifecycle policies with Terraform, reducing the company's cost for some highly used buckets by over %60.
      • Ported multiple terabyte-scale, business-critical ETL workflows from Redshift to Databricks using PySpark. Led the development of new ETLs, managed the backfilling of historical data, and ensured data integrity during the transition from the old Redshift pipeline to the new Databricks platform. This overhaul resulted in a 40% increase in speed for essential data ingestion jobs and yielded significant cost savings by reducing Redshift compute expenses.
      • Architected and implemented a bespoke local Airflow deployment to optimize end-to-end (E2E) testing, significantly accelerating new feature development cycles. This versatile solution enabled non-technical stakeholders to conduct basic job tests autonomously, while providing technical users with the tools to manage a comprehensive Kubernetes (K8s) and Databricks development environment with ease.
      • Served On-Call for the Data Platform team, rapidly responding to incidents and helping stake-holder developers with questions regarding Databricks, PySpark, Redshift, and more.
    • May 2021 - Aug 2021

      Gilbert, AZ

      Machine learning R&D Intern
      • Used ML-OPS best practices in PyTorch to create an end-to-end automated machine learning pipeline for various tasks such as real-time object detection and machine translation. This let developers create a PoC for new projects in a matter of hours rather then days.
      • Implemented automatic logging and visualization of relevant hyperparameters and metrics using MLFLow.

    Skills

    Programming Lanuages
    Python
    Terraform
    C/C++
    SQL
    JavaScript
    Technologies
    Databricks
    Spark
    MLFlow
    Airflow
    Kubernetes
    PostgreSQL
    PyTorch
    S3, ECR, EC2
    Developer Tools
    JIRA
    Git
    Confluence

    Projects

    • Distributed Database Hotspot Analysis using Apache Sedona
      • Resource Description Framework (RDF) Database

        Awards

        • December 2022
          Moore Award
          Arizona State University
          Given for graduating with a 4.0 G.P.A in under 8 semesters