Review of "Omega: flexible, scalable schedulers for large compute clusters"

29 Sep 2015

Review of "Omega: flexible, scalable schedulers for large compute clusters"

Similar to Mesos, Omega is also designed to bring better resource management of data centers. While unlike Mesos's two-level resource offering model, which hides the overall cluster resources, Omega proposed a shared-state model.

According to this paper, most (> 80%) jobs are batch jobs, but the majority of resources (55 - 80%) are allocated to service jobs. Service jobs usually run for a much longer time than batch jobs. Omega is designed to accommodate these types of jobs. One thing to notice is fairness is not one of the main concerns of Omega, it's more driven by business requirements.

Omega takes a shared-state scheduling method. All schedulers have been granted access to the whole cluster. They are allowed to compete in a free-for-all manner and use optimistic concurrency control to mediate clashes when they update the cluster state. For example, when individual schedulers put new tasks onto some nodes, it tries to push this update to the central state, but it might fail because of conflict. (Other schedulers might have scheduled tasks on the same nodes already.) In that case, the scheduler can just retry. (According to this paper, conflicts are pretty rare event in Google's production environment.) By using optimistic concurrency, schedulers has a very high level of parallelism. But it also made the trade-off that if the optimistic concurrency assumptions are incorrect, the scheduler has to redo the work. There's no resource allocator in Omega; all of the resource allocation decisions take place in the schedulers. A resilient master copy of the resource allocations in the cluster is maintained. Each scheduler is given a private, local, frequently-updated copy of cell state that it uses for making scheduling decisions.

Will this paper be influential in 10 years? Maybe, it proposes a shared-state scheduling method which allows for efficient scheduling of tasks.