Back arrow
Go back to all blog posts
Scale your data pipelines with Orchest and Kubernetes

Scale your data pipelines with Orchest and Kubernetes

We are pleased to announce a new major release of Orchest powered by Kubernetes, bringing a change that we have been working on for quite some time. But how does this change affect our users, and our roadmap in the short and medium term?

Kubernetes (also known as k8s) is "an open source system for automating deployment, scaling, and management of containerized applications", and it works with a wide range of container solutions (Docker, Containerd, CRI-O). It provides several useful features like automatic horizontal scaling, self-healing, load balancing, and more, and as such it has become the go-to container orchestration platform (in its most recent survey, the Cloud Native Computing Foundation found that 96 % of respondents are using or evaluating it). All these features make Kubernetes a perfect solution for orchestrating data pipelines.

However, the price of this level of automation is a system with a daunting complexity: there are numerous moving parts, configuration files are difficult to write, and operating a Kubernetes cluster requires some expertise. This, in turn, makes Kubernetes less accessible for users that don't have enough knowledge of system administration, networking, and distributed systems.

Orchest mission has always been to empower Data Scientists by building a platform that abstracts away all the tasks that get in the way. For this reason, Orchest takes care of spawning and managing containers on behalf of the user so that they can focus on model training and deployment. Before this release we used to have a bespoke backend to perform all these tasks, and since Orchest v2022.03.7 all the orchestration is powered by Kubernetes, allowing us to better execute on our goals. In words of Rick Lamers, our CEO:

“The move to Kubernetes was long in the making, it truly unleashes the architecture of Orchest to operate at cluster scale. With containers as first class primitives throughout the application we're seeing linear speedups across a number of features.
This change makes it possible to work with a large team on a single beefy cluster. That opens up many interesting opportunities for collaboration.”

For our users, this means that they can potentially have hundreds of parallel jobs, and easily deploy Orchest in their own existing cluster. On the other hand, solutions like minikube provide the perfect way to run Orchest in a single node laptop for maximum convenience.

It goes without saying that all this is still a work in progress: we continue working hard on simplifying multinode ingress, improving the local installation process, and deploying all these changes to our own Orchest Cloud offering. Still, we are beyond excited about the new workflows that Orchest can unlock for our users, and looking forward to keep improving the accessibility of data pipelines.

Download Orchest to your computer now and start creating data pipelines, the easy way!