Flyte

Flyte is a workflow automation platform for complex, mission-critical data and ML processes at scale

Home Page · Quick Start · Documentation · Features · Community & Resources · Changelogs · Components

💥
Introduction

Flyte is a structured programming and distributed processing platform that enables highly concurrent, scalable and maintainable workflows for Machine Learning and Data Processing. It is a fabric that connects disparate computation backends using a type safe data dependency graph. It records all changes to a pipeline, making it possible to rewind time. It also stores a history of all executions and provides an intuitive UI, CLI and REST/gRPC API to interact with the computation.

Flyte is more than a workflow engine -- it uses a workflow as a core concept and a task (a single unit of execution) as a top level concept. Multiple tasks arranged in a data producer-consumer order create a workflow.

Workflows and Tasks can be written in any language, with out of the box support for Python, Java and Scala.

⏳
Five Reasons to Use Flyte

Kubernetes-Native Workflow Automation Platform

Ergonomic SDK's in Python, Java & Scala

Versioned & Auditable

Reproducible Pipelines

Strong Data Typing

🚀
Quick Start

With Docker installed and Flytectl installed, run the following command:

  flytectl sandbox start

This creates a local Flyte sandbox. Once the sandbox is ready, you should see the following message: Flyte is ready! Flyte UI is available at http://localhost:30081/console.

Visit http://localhost:30081/console to view the Flyte dashboard.

Here's a quick visual tour of the console.

To dig deeper into Flyte, refer to the Documentation.

⭐️
Current Deployments & Contributors

Freenome

Gojek

Intel

Lyft Rideshare, Mapping

Level 5 Global Autonomous (Woven Planet)

RunX.dev

Spotify

Striveworks

Union.ai

USU Group

Wolt

🔥
Features

Used at Scale in production by 500+ users at Lyft with more than 1 million executions and 40+ million container executions per month

A data aware platform

Enables collaboration across your organization by:
- Executing distributed data pipelines/workflows
- Reusing tasks across projects, users, and workflows
- Making it easy to stitch together workflows from different teams and domain experts
- Backtracing to a specified workflow
- Comparing results of training workflows over time and across pipelines
- Sharing workflows and tasks across your teams
- Simplifying the complexity of multi-step, multi-owner workflows

Quick registration -- start locally and scale to the cloud instantly

Centralized Inventory constituting Tasks, Workflows and Executions

gRPC / REST interface to define and execute tasks and workflows

Type safe construction of pipelines -- each task has an interface which is characterized by its input and output, so illegal construction of pipelines fails during declaration rather than at runtime

Supports multiple data types for machine learning and data processing pipelines, such as Blobs (images, arbitrary files), Directories, Schema (columnar structured data), collections, maps, etc.

Memoization and Lineage tracking

Provides logging and observability

Workflow features:
- Start with one task, convert to a pipeline, attach multiple schedules, trigger using a programmatic API, or on-demand
- Parallel step execution
- Extensible backend to add customized plugin experience (with simplified user experience)
- Branching
- Inline subworkflows (a workflow can be embeded within one node of the top level workflow)
- Distributed remote child workflows (a remote workflow can be triggered and statically verified at compile time)
- Array Tasks (map a function over a large dataset -- ensures controlled execution of thousands of containers)
- Dynamic workflow creation and execution with runtime type safety
- Container side plugins with first class support in Python
- PreAlpha: Arbitrary flytekit-less containers supported (RawContainer)

Guaranteed reproducibility of pipelines via:
- Versioned data, code and models
- Automatically tracked executions
- Declarative pipelines

Multi cloud support (AWS, GCP and others)

Extensible core, modularized, and deep observability

No single point of failure and is resilient by design

Automated notifications to Slack, Email, and Pagerduty

Multi K8s cluster support

Out of the box support to run Spark jobs on K8s, Hive queries, etc.

Snappy Console

Python CLI and Golang CLI (flytectl)

Written in Golang and optimized for large running jobs' performance

Grafana templates (user/system observability)

In Progress

Demos; Distributed Pytorch, feature engineering, etc.

Integrations; Great Expectations, Feast

Least-privilege Minimal Helm Chart

Relaunch execution in recover mode

Documentation as code

🔌
Available Plugins

Containers

K8s Pods

AWS Batch Arrays

K8s Pod Arrays

K8s Spark (native Pyspark and Java/Scala)

AWS Athena

Qubole Hive

Presto Queries

Distributed Pytorch (K8s Native) -- Pytorch Operator

Sagemaker (builtin algorithms & custom models)

Distributed Tensorflow (K8s Native) -- TFOperator

Papermill notebook execution (Python and Spark)

Type safe and data checking for Pandas dataframe using Pandera

Versioned datastores using DoltHub and Dolt

Use SQLAlchemy to query any relational database

Build your own plugins that use library containers

📦
Component Repos

Repo	Language	Purpose	Status
flyte	Kustomize,RST	deployment, documentation, issues	Production-grade
flyteidl	Protobuf	interface definitions	Production-grade
flytepropeller	Go	execution engine	Production-grade
flyteadmin	Go	control plane	Production-grade
flytekit	Python	python SDK and tools	Production-grade
flyteconsole	Typescript	admin console	Production-grade
datacatalog	Go	manage input & output artifacts	Production-grade
flyteplugins	Go	flyte plugins	Production-grade
flytestdlib	Go	standard library	Production-grade
flytesnacks	Python	examples, tips, and tricks	Incubating
flytekit-java	Java/Scala	Java & scala SDK for authoring Flyte workflows	Incubating
flytectl	Go	A standalone Flyte CLI	Incomplete

🔩
Production K8s Operators

Repo	Language	Purpose
Spark	Go	Apache Spark batch
Flink	Go	Apache Flink streaming

🤝
Community & Resources

Here are some resources to help you learn more about Flyte.

Communication Channels

Slack

Email list

Twitter

LinkedIn Discussion Group

GitHub Discussions

Biweekly Community Sync

📣
Flyte OSS Community Sync Every other Tuesday, 9am-10am PDT. Checkout the calendar and register to stay up-to-date with our meeting times. Or simply join us on Zoom.