flyte

联合创作 · 2023-09-27 08:41


Flyte


Flyte is a workflow automation platform for complex, mission-critical data and ML processes at scale


Current Release Sandbox Build End-to-End Tests License Commit Activity Commits since Last Release GitHub Milestones Completed GitHub Next Milestone Percentage Docs Twitter Follow Flyte Helm Chart Join Flyte Slack


Home Page · Quick Start · Documentation · Features · Community & Resources · Changelogs · Components



💥
Introduction


Flyte is a structured programming and distributed processing platform that enables highly concurrent, scalable and maintainable workflows for Machine Learning and Data Processing. It is a fabric that connects disparate computation backends using a type safe data dependency graph. It records all changes to a pipeline, making it possible to rewind time. It also stores a history of all executions and provides an intuitive UI, CLI and REST/gRPC API to interact with the computation.


Flyte is more than a workflow engine -- it uses a workflow as a core concept and a task (a single unit of execution) as a top level concept. Multiple tasks arranged in a data producer-consumer order create a workflow.


Workflows and Tasks can be written in any language, with out of the box support for Python, Java and Scala.




Five Reasons to Use Flyte



  • Kubernetes-Native Workflow Automation Platform

  • Ergonomic SDK's in Python, Java & Scala

  • Versioned & Auditable

  • Reproducible Pipelines

  • Strong Data Typing



🚀
Quick Start


With Docker installed and Flytectl installed, run the following command:



  flytectl sandbox start


This creates a local Flyte sandbox. Once the sandbox is ready, you should see the following message: Flyte is ready! Flyte UI is available at http://localhost:30081/console.


Visit http://localhost:30081/console to view the Flyte dashboard.


Here's a quick visual tour of the console.



To dig deeper into Flyte, refer to the Documentation.



⭐️
Current Deployments & Contributors




🔥
Features



  • Used at Scale in production by 500+ users at Lyft with more than 1 million executions and 40+ million container executions per month

  • A data aware platform

  • Enables collaboration across your organization by:

    • Executing distributed data pipelines/workflows

    • Reusing tasks across projects, users, and workflows

    • Making it easy to stitch together workflows from different teams and domain experts

    • Backtracing to a specified workflow

    • Comparing results of training workflows over time and across pipelines

    • Sharing workflows and tasks across your teams

    • Simplifying the complexity of multi-step, multi-owner workflows


  • Quick registration -- start locally and scale to the cloud instantly

  • Centralized Inventory constituting Tasks, Workflows and Executions

  • gRPC / REST interface to define and execute tasks and workflows

  • Type safe construction of pipelines -- each task has an interface which is characterized by its input and output, so illegal construction of pipelines fails during declaration rather than at runtime

  • Supports multiple data types for machine learning and data processing pipelines, such as Blobs (images, arbitrary files), Directories, Schema (columnar structured data), collections, maps, etc.

  • Memoization and Lineage tracking

  • Provides logging and observability

  • Workflow features:

    • Start with one task, convert to a pipeline, attach multiple schedules, trigger using a programmatic API, or on-demand

    • Parallel step execution

    • Extensible backend to add customized plugin experience (with simplified user experience)

    • Branching

    • Inline subworkflows (a workflow can be embeded within one node of the top level workflow)

    • Distributed remote child workflows (a remote workflow can be triggered and statically verified at compile time)

    • Array Tasks (map a function over a large dataset -- ensures controlled execution of thousands of containers)

    • Dynamic workflow creation and execution with runtime type safety

    • Container side plugins with first class support in Python

    • PreAlpha: Arbitrary flytekit-less containers supported (RawContainer)


  • Guaranteed reproducibility of pipelines via:

    • Versioned data, code and models

    • Automatically tracked executions

    • Declarative pipelines


  • Multi cloud support (AWS, GCP and others)

  • Extensible core, modularized, and deep observability

  • No single point of failure and is resilient by design

  • Automated notifications to Slack, Email, and Pagerduty

  • Multi K8s cluster support

  • Out of the box support to run Spark jobs on K8s, Hive queries, etc.

  • Snappy Console

  • Python CLI and Golang CLI (flytectl)

  • Written in Golang and optimized for large running jobs' performance

  • Grafana templates (user/system observability)


In Progress



  • Demos; Distributed Pytorch, feature engineering, etc.

  • Integrations; Great Expectations, Feast

  • Least-privilege Minimal Helm Chart

  • Relaunch execution in recover mode

  • Documentation as code



🔌
Available Plugins




📦
Component Repos





















































































Repo Language Purpose Status
flyte Kustomize,RST deployment, documentation, issues Production-grade
flyteidl Protobuf interface definitions Production-grade
flytepropeller Go execution engine Production-grade
flyteadmin Go control plane Production-grade
flytekit Python python SDK and tools Production-grade
flyteconsole Typescript admin console Production-grade
datacatalog Go manage input & output artifacts Production-grade
flyteplugins Go flyte plugins Production-grade
flytestdlib Go standard library Production-grade
flytesnacks Python examples, tips, and tricks Incubating
flytekit-java Java/Scala Java & scala SDK for authoring Flyte workflows Incubating
flytectl Go A standalone Flyte CLI Incomplete


🔩
Production K8s Operators






















Repo Language Purpose
Spark Go Apache Spark batch
Flink Go Apache Flink streaming


🤝
Community & Resources


Here are some resources to help you learn more about Flyte.


Communication Channels



Biweekly Community Sync




  • 📣
    Flyte OSS Community Sync Every other Tuesday, 9am-10am PDT. Checkout the calendar and register to stay up-to-date with our meeting times. Or simply join us on Zoom.

  • Upcoming meeting agenda, previous meeting notes and a backlog of topics are captured in this document.

  • If you'd like to revisit any previous community sync meetings, you can access the video recordings on Flyte's YouTube channel.


Conference Talks



  • Kubecon 2019 - Flyte: Cloud Native Machine Learning and Data Processing Platform video | deck

  • Kubecon 2019 - Running LargeScale Stateful workloads on Kubernetes at Lyft video

  • re:invent 2019 - Implementing ML workflows with Kubernetes and Amazon Sagemaker video

  • Cloud-native machine learning at Lyft with AWS Batch and Amazon EKS video

  • OSS + ELC NA 2020 splash

  • Datacouncil video | splash

  • FB AI@Scale Making MLOps & DataOps a reality

  • GAIC 2020

  • OSPOCon 2021 Catch a variety of Flyte talks - final schedule and topics to be released soon.


Blog Posts



Podcasts




💖
Top Contributors


A big thank you to the community for making Flyte possible!


浏览 7
点赞
评论
收藏
分享

手机扫一扫分享

编辑 分享
举报
评论
图片
表情
推荐
点赞
评论
收藏
分享

手机扫一扫分享

编辑 分享
举报