Machine Learning Infrastructure Engineer
About the Company
At Torc, we have always believed that autonomous vehicle technology will transform how we travel, move freight, and do business.
A leader in autonomous driving since 2007, Torc has spent over a decade commercializing our solutions with experienced partners. Now a part of the Daimler family, we are focused solely on developing software for automated trucks to transform how the world moves freight.
Join us and catapult your career with the company that helped pioneer autonomous technology, and the first AV software company with the vision to partner directly with a truck manufacturer.
What you’ll do:
We are seeking an experienced ML Infrastructure Engineer highly skilled in designing, implementing, and optimizing the end-to-end machine learning lifecycle in a cloud-based environment, from model development and training to deployment and monitoring. In this role you will be responsible for working closely with engineers to drive the technical road map of AWS-native ML pipelines that power autonomy across the enterprise.
- Collaborate with Autonomy Engineers, Data Scientists, and Software Engineers to design and implement robust and efficient ML pipelines, ensuring smooth model training, evaluation, and deployment processes
- Develop scalable, highly available, automated systems to facilitate:
- Data search, selection, and preparation
- Model training orchestration, experiment tracking, and performance evaluation
- Model deployment into production systems (including inference as a service)
- Promote and protect the integrity of data and models through validation, versioning, and provenance
- Govern model and data access throughout the data lake at table, column, and row levels
- Assist in architecture and development of cloud-based solutions for all aspects of software build, test and deployment processes.
- Collaborate with teams specializing in perception, planning, control, mapping and vehicle testing to develop solutions that support their development efforts.
- Support the implementation of emerging cloud-based capabilities that can extend our technology stack and improve our ability to build, deploy and test safety-critical software for self-driving vehicles.
Here’s a list of some of the technologies we use to make all of the above happen:
- Managed services powered by AWS (SageMaker, Lambda, SFN, EventBridge, Athena, Glue)
- PyTorch, Comet ML, MLflow
- On-Call Tooling (PagerDuty, Datadog, AWS Cloudwatch)
Meet the Team:
We work in a distributed team and collaborate frequently in many different forms including daily stand-ups, planning meetings, and many ad-hoc discussions/brainstorming/troubleshooting/pairing sessions over zoom or slack. Our team is operationally responsible for the services we own, so we do have an on-call rotation in place where each member will take a turn serving as the front-line to incidents for our services. Rotations last a week each, during business hours. We value maintaining a healthy work/life balance and prefer sustainable development over heroic efforts.
What you’ll need to Succeed:
- BS/MS Degree in Computer Engineering, Computer Science, or related field
- 4 plus years of experience building and maintaining workloads in public cloud environments
- Solid understanding of data storage and database architectures, including but not limited to relational and NoSQL databases, data warehousing and clustered, distributed data stores
- Solid understanding of cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).
- A strong commitment to test-driven development patterns, continuous integration and delivery, and infrastructure as code
- Practical experience with Python libraries for applied data science (Pandas, Plotly, Matplotlib, Dask) and machine learning (TensorFlow, Keras, Caffe, Theano, etc.)
- Strong organizational, time management, and communication skills working with a team orientation and collaborative style
- Deep knowledge of AWS serverless architectures (Lambda, Batch, ECS Fargate, Glue, Athena)
- Experience with ML lifecycle management, data storage, and acquisition patterns for robotics and advanced driver assistance systems
Perks of Being a Full-time Torc’r
Torc cares about our team members and we strive to provide benefits and resources to support their health, work/life balance, and future. Our culture is collaborative, energetic, and team focused. Torc offers:
- A competitive compensation package that includes a bonus component and stock options
- 100% paid medical, dental, and vision premiums for full-time employees
- 401K plan with a 6% employer match
- Flexibility in schedule and generous paid vacation (available immediately after start date)
- Company-wide holiday office closures
- AD+D and Life Insurance
At Torc, we’re committed to building a diverse and inclusive workplace. We celebrate the uniqueness of our Torc’rs and do not discriminate based on race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, veteran status, or disabilities.
Even if you don’t meet 100% of the qualifications listed for this opportunity, we encourage you to apply. We’re always looking for those that are hungry, humble, and people smart and your unique experience may be a great fit for this role or others.