Back arrow
Go back to all blog posts
Orchest at Spike

Orchest at Spike

  • Spikelab.xyz uses Orchest to help deliver a managed data science project for a mid size mining company.
  • Orchest helps Spike's team to standardize the way they create and deliver projects to customers who lack access to data engineering resources either due to company size or fast lead times: "We're increasingly seeing teams from large companies come to us needing insights faster than it would take to build out engineering infrastructure to deliver those insights".
  • The ability to ship ready to use notebooks complete with pre-packaged jobs, environments, and pipelines in a way that sets up data science teams at their clients for success is important to Spike's business operations.
  • The GUI enables better collaboration internally within Spike's teams and with data science resources at the client.

Spike's use cases for Orchest include:

  • Model development
  • Model retraining
  • ETL with custom transform requirements
  • Automated and on demand back-end engineering for teams lacking access to data engineering resources
  • Easy delivery of projects that require fast lead times

About Spike (Spikelab.xyz)

Spike is a leading machine learning and data innovation company based in Chile, servicing a broad array of clients throughout South America. With a deep bench of machine learning experts they're changing how companies uncover insight from their data. Spike is delivering scalable solutions to problems requiring new insight in demanding environments with modern data science tooling and custom approaches. As of September 2021, Spike counts the largest and most innovative companies across Healthcare, Airlines, Mining, Telco, and Energy as their clients (P.S. If you want to review their innovative work, check out their Medium).

Why Orchest?

Spike's challenge was to empower eventual future data science hires at their client with ready to use models & infrastructure whilst also delivering models at production as a managed service until their client properly staffed up. When evaluating the landscape, Spike considered alternatives such as AWS Lambda, KNIME, Kubeflow and Airflow in combination with Papermill but ultimately went with Orchest because it struck the perfect balance between capabilities and ease of use. The team liked Orchest’s excellent GUI, the fact that it’s open source and easy to deploy on their customer's infrastructure. The install process was easy to set up in their clients VM and scaling vertically has been great given a small team within Spike is able to stand up the clients pipelines at production. In the course of delivering the project Spike also found some additional benefit from deploying in Orchest:

  • Jupyter Notebook native workflows
  • The value of a visual environment with a GUI based pipeline building interface
  • How much time was saved by not having to set up the client's entire data stack
  • The value of being able to code within the environment
  • The value of delivering a ready to use data science project

The Pipelines

'OMG! This is what Data Scientists need.'

"Before Orchest the challenge for us has always been 'How do we orchestrate various deliverables (model training/retraining, sending email with output updates, putting corresponding data in our customers database — or transforming it and putting it back)' and set up the entire data stack for clients, mostly because it is a lot of work and we are a lean team. Our Data Scientists want to use Notebooks, not tools meant for Data Engineers or SW Developers," shares Matias Aravena, Spike's lead Machine Learning Engineer. "Our Data Scientist's expertise is more related to experimental iteration of model development, not really production. MLOps is not necessarily in their toolkit. When we first tried Orchest we said 'OMG! This is what Data Scientists need.' Having Jupyterlab there is amazing, for Data Scientists it is familiar and so doesn't change their workflow at all, they can be really productive in Orchest.

It's also great that they can see a visualization of the pipeline editor because it's helpful for collaboration both internally within our teams but also with our client's teams. Our client's pipeline involves sensor data collected, moving that into Google's BigQuery, making the data available to a VM, and then from notebooks getting data included in reports delivered via email. Orchest was very easy to deploy on our customer's VM. We offer a managed service to our client for this data science project. We think Orchest brings us a lot of value not just now but in the future. The client plans on staffing up their own Data Scientists, those hires will be able to start building models from day 1 because the entire project is ready to go in Orchest."

Additional Insight

A recent upgrade to Orchest's latest version has been helpful in empowering Spike and their client to benefit from Tensorflow and Streamlit directly within Orchest. Spike is also finding that teams within larger org's may benefit from Orchest given lack of access to required data engineering resources to get infrastructure stood up for their data science projects, "Sometimes a company's policy is heavily bureaucratic and the teams we work with can't get their job done. Maybe the infra they use is too slow, maybe we can't get access because the data we need is behind a VPN, maybe the Data Scientists at the client are regulated to have to code in Java; getting to production is hard because of all the rules. Maybe in these scenarios Orchest could help teams who are just after insight after all to move more quickly."

Final Say

"I am really in love with Orchest! Getting to production is hard, but it doesn't need to be, especially for projects that are more insights driven than delivering an application. It is everything you need in one tool. For teams that are constrained for various reasons moving forward we plan on using Orchest." -Matias Aravena, Machine Learning Engineer, Spike