Hacker Newsnew | past | comments | ask | show | jobs | submit | unode's commentslogin

yep, check out Spotify's Luigi project. Probably the most widely adopted OSS one https://github.com/spotify/luigi


Are there people who have more experience with comparative workflow managers who can quickly see the pros and cons of Pinball vs. Luigi? Perhaps someone at Pinterest who tried out other systems, as was mentioned in the post? (Though maybe Luigi wasn't available to the public when this comparison happened.)


Luigi was not available in public, when Pinball starts. So not sure the pros and cons between Pinball and Luigi.

When we build pinball, we aim to build a scalable and flexible workflow manager to satisfy the the following requirements (I just name a few here).

1. easy system upgrade - when we fix bug or adding new features, there should be no interruption for current running workflow and jobs. 2. easy add/test workflow - end user can easily add new jobs and workflows into pinball system, without affecting other running jobs and workflows. 3. extensibility - a workflow manager should be easy to extended. As the company and business grows, there will be a lot new requirements and features needed. And also we love your contributions as well. 4. flexible workflow scheduling policy, easy failure handling. 5. We provide rich UI for you to easily manage your workflows - auto retry failed job, - you can retry failed job, can skip some job, can select a subset of jobs of a workflow to run (all from UI) - you can easily access all the running history of your job, and also get the stderr, stdout logs of your jobs - you can also explore the topology of your workflow, and also support easy search. 6. Pinball is very generic can support different kind platform, you can use different hadoop clusters,e.g., quoble cluster, emr cluster. You can write different kind of jobs, e.g., hadoop streaming, cascading, hive, pig, spark, python ...

There are a lot interesting things built in Pinball, and you probably want to have a try!


We are heavy users of Luigi in my company. Its central scheduler process is also UI and sometimes UI stuck for us.

Luigi though has a lot of pipeline building blocks - it provides api to access HDFS, S3, write/read from it etc. They are very useful, but they are executed in the same Python process as the rest of Job - which heavily loads the machine where Job is executed (in our case - same server where luigid scheduler runs).

I'm excited about Pinball architecture. I'd try to use Pinball as scheduler to execute existing Luigi task classes instances on multiple servers.


I've ported several reasonably complex jobs (files delivered to FTP at arbitrary times to be run through several Hadoop jobs) to luigi and it's been very good. Much more resilient than trying to use something that can only schedule jobs at specific times of the day.

It also has few dependencies and is lightweight (i.e. it's all python, so no JVM tying up resources).


fwiw, it is not the case that pinball can schedule jobs only at specific times of the day. In fact the scheduler is merely a special type of worker that happens to start new workflows. It is totally doable to kick off a new workflow at any point in time, bypassing the scheduler.

Also, Pinball is also all Python but it currently has a dependency on mysql so it is definitely not as a lightweight as a standalone tool as luigi but it also offers much more in terms of the available features.


HOSTING - http://www.hosting.com/software-engineer-co/ - Denver, CO Agile teams around 4 Engineers. Using C# .NET, MVC & MVVM. KendoUI and SQL2012.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: