Distributed Job Scheduling for AWS
Guest post by Rajat Bhargava, JumpCloud
Recently, Medium CTO Don Neufeld highlighted the need for a distributed job scheduling system on AWS. As more and more companies use AWS for their infrastructure, they need tools to execute tasks and schedule those tasks across the server infrastructure. A key component of that process is orchestrating server tasks across an AWS implementation. Historically, this has been done through scripting and cron.
While that may still work, a better system is needed. As more workloads move to the cloud and organizations focus on a one server, one task model enabling easy horizontal scaling, the need for a distributed scheduling system is more critical than ever. Unfortunately, executing a set of tasks across a wide-area server infrastructure is still not easy. Examples of issues that occur include managing access to devices, orchestrating a sequence of tasks, and building Boolean logic into task execution. Juggling these problems forces organizations into manual execution of tasks (or at minimum some manual involvement), which greatly diminishes the leverage that the cloud provides.
In this post, we’ll analyze three different options for executing a set of distributed jobs across an AWS infrastructure: scripts and cron, open source tools, and finally one commercial option.
Scripts and Cron
One way to handle a set of distributed tasks is to leverage cron to build a schedule of execution tasks. A tried and true tool since the 1970s, cron ends up being the core of scheduling in the *NIX world. Unfortunately, cron doesn’t have the concept of distributed job execution. Cron is built for executing tasks on one particular server. Also, visibility with cron leaves a lot to be desired. Unless you are willing to write more code around cron, you don’t get reports that tell you whether or not a job was completed successfully.
To the extent that you want to create a “distributed” scheduling process, you’ll need to write code or at least sequence your events properly. Coding is required if you want to chain events together and ensure that each was completed before the next step is kicked off. Admins end up mixing scripting and cron and go as far as they think is practical with scripting before turning to the good, old-fashioned manual execution.
Complex scenarios take time to code. Although it may not be pretty—most of the time you get a one-off solution that’s difficult to reuse—it does end up working. The challenge with this approach is that it is relatively fragile; if things don’t go just right, the system can’t adjust. Although this approach is the most accessible to admins, it’s no wonder that Dan wants something better.
Open Source Alternatives
Two open source approaches that can be leveraged to execute a set of distributed tasks are Chronos and Luigi. Chronos is a distributed execution system meant to replace cron. It is also fault tolerant and lives on top of Mesos, the Apache cluster manager. With Chronos, you can schedule a pipeline of tasks across your entire infrastructure, wherever it may live. The system is able to execute tasks based on previously completed ones, and includes a mechanism to notify Chronos of individual task failures. The blog post announcing Chronos touts other benefits:
“Chronos … allows you to schedule your jobs using ISO8601 repeating interval notation, which enables more flexibility in job scheduling. Chronos also supports the definition of jobs triggered by the completion of other jobs, and it also supports arbitrarily long dependency chains.”
Although Chronos is a significant step up over manual scripts or cron, it still requires some manual work to implement. Further, because Chronos requires Apache Mesos to manage communications and resource allocation, it requires the installation and configuration of Mesos throughout your network.
Another open source system that can handle a pipeline of batch jobs is Luigi. Luigi is Python-based and, like Chronos, can handle dependencies and errors. Like Chronos, the impetus for Luigi was to handle a complex set of database or data manipulation tasks. Luigi does have native support for some database tasks such as MapReduce jobs in Hadoop.
The JumpCloud Option
Commercial entities are beginning to recognize the critical problem of building task workflows in today’s cloud environments. Although some of the world’s largest software makers offer enterprise software for executing complex pipelines of tasks, we are going to focus on a SaaS-based solution that works closely with AWS called JumpCloud. JumpCloud, based on AWS infrastructure, syncs instance IDs to ensure that you know which EC2 instances you are working with when executing tasks across your infrastructure. JumpCloud is an AWS partner and Activate sponsor.
AWS customers are building complex infrastructures and then trying to automate the management and execution of infrastructure tasks. This is exactly the problem JumpCloud is trying to solve. We call it server management, although it just as easily could be called server orchestration, job scheduling, task and workflow automation, database / data manipulation, or any number of other things.
How JumpCloud Automates Complex Workflows with Server Orchestration Tool
With JumpCloud, you can easily build a complex workflow of tasks. You treat tasks like building blocks that you chain together or chain to multiple others. Webhooks can trigger events to start. You can also “join” or “split” tasks so that you can leverage distributed and scaled infrastructure. For example, if you need all your database servers to finish indexing a table before you can execute a report, the JumpCloud join feature ensures that all your database servers are done indexing before moving to the next step. JumpCloud can also let you execute n number of processes as one step.
For example, if you’ve got logs on multiple EC2 instances that you want to clean up each time right before you restart your web servers (running on a different set of EC2 instances), you can do so easily with JumpCloud. In the diagram below, you can see how you define a RotateLogs trigger, which executes against one set of servers. As that log rotate job completes across the EC2 instances, the next job, named “WebServerRestart” can start. JumpCloud takes care of waiting for all the “RotateLogs” jobs to complete before starting the next step. While this is a very simple and straightforward example, you can create workflows across your EC2 instances that are as complex as you need.
JumpCloud’s functionality is powerful and can help you automate a whole workflow quickly and easily. The benefit of JumpCloud is that you won’t have to write the plumbing and manage the execution of your tasks.
As an AWS Activate partner, you can try JumpCloud for free. If you are an Activate member, you can get a 60-day free trial. And if you are a Portfolio member of Activate, you get 90 days free.
Creating, managing, and executing a series of jobs across your AWS infrastructure can be a daunting task. Most admins have taken the approach of leveraging cron and perhaps scripting some code around that. Others prefer to implement open source alternatives. For those that are interested in a commercial alternative, take a look at AWS Activate partner JumpCloud.
Come join the AWS Distributed Job Scheduler (DJS) team on an exciting voyage as we enable internal Amazon developers to be productive.
Today, DJS allows Amazon teams to automate critical jobs on a schedule. We have been doing this for a while now but we want to take a major leap from where we are. You might even say a "time jump."
Looking to the future, we want to reinvent the way Amazon teams create and run any kind of task (any type of executable that runs for a finite period of time). Our long-term vision is to make developers productive by letting them focus solely on implementing the scheduled tasks because we will take care of everything else (where it runs, how it runs, its dependencies, monitoring, e.t.c.). We want to ensure our users (developers like you) focus solely on coding.
We will need to build new services and systems to reach this exciting future. We will need your help to shape what these new systems should be.
Inclusive Team Culture
Here at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee- led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust.
Our team puts a high value on work-live balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.
This position involves on-call responsibilities, typically for one week every two months. We do not like getting paged in the middle of the night or on the weekend, so we work to ensure that our systems are fault tolerant. When we do get paged, we work together to resolve the root cause so that we don’t get paged for the same issue twice.
Mentorship & Career Growth
Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future.
· 2+ years of non-internship professional software development experience
· Programming experience with at least one modern language such as Java, C++, or C# including object-oriented design
· 1+ years of experience contributing to the architecture and design (architecture, design patterns, reliability and scaling) of new and current systems.
- 3+ years of experience developing objected-oriented software, with thorough experience in one or more relevant language (Java, C#, C++, Ruby, Python)
- 2+ years of experience building distributed systems
- Experience with a high-volume, highly-available, distributed services
- Understanding of asynchronous and distributed systems problems
- Experience solving infrastructure software architectural and design issues
- Works well in a team environment and be able to effectively drive cross-team solutions that have complex dependencies and requirements
Amazon is an Equal Opportunity-Affirmative Action Employer – Minority / Female / Disability / Veteran / Gender Identity / Sexual Orientation.
We believe passionately that employing a diverse workforce is central to our success and we make recruiting decisions based on your experience and skills. We welcome applications from all members of society irrespective of age, gender, disability, sexual orientation, race, religion or belief.
- Gasoline extractor
- Royal ln apartments
- Blender show dimensions
- Mustang 2019 white
- Mini sheepadoodle iowa
AWS Batch with Fargate resources allows you to have a completely serverless architecture for your batch jobs. With Fargate, every job receives the exact amount of CPU and memory that it requests (within allowed Fargate SKU’s), so there is no wasted resource time or need to wait for EC2 instance launches.
If you’re a current Batch user, Fargate allows for an additional layer of separation away from EC2. There’s no need to manage or patch AMI’s. When submitting your Fargate-compatible jobs to Batch, you don’t have to worry about maintaining two different services if you have workloads that run on EC2, and others that run on Fargate.
AWS Provides a cloud-native scheduler complete with a managed queue and the ability to specify priority, retries, dependencies, timeouts, and more. Batch manages submission to Fargate and the lifecycle of your jobs so you don’t have to.
Fargate also provides security benefits that come with no added effort (e.g., SOX, PCI compliance), and isolation between compute resources for every job.
JumpCloud’s Distributed Job Scheduling Systems for AWS & Cloud Providers
Earlier this past week, I noticed an exchange between AWS’ CTO Werner Vogels and Medium’s CTO Don Neufeld. Werner had asked what else Don would like to see AWS build for them. Don’s top priority? A distributed scheduling system like the open source solution, Chronos. This is a timely subject for us at JumpCloud® because our Directory-as-a-Service® platform allows engineers to easily build complex distributed job scheduling systems. We are specifically referring to our platform’s device management functionality. We understand the importance of task and workflow scheduling to the management of your infrastructure. IT admins have been doing this for years with tools such as scripts and cron.
The History Behind the Hassle – Cron and Distributed Job Scheduling Systems
Unfortunately, none of this is easy. Scripts need to be run on a wide variety of servers and infrastructure. How do you manage the access to those devices? How do you orchestrate all of those tasks together across your infrastructure? One option is to keep writing code to build all of those input and output events. You could also manually track completion and kick off the next stage in your process. Admins usually mix the two and go as far as possible inside scripts before turning to manual execution. One-off solutions, which are difficult to reuse, are the likely result. While it may not be a pretty process, it usually works.
Cron is often found at the core of scheduling. Cron has been around since the ’70s, so it is definitely tried and true. Unfortunately, cron doesn’t have the concept of distributed job execution. Cron is built for executing tasks on one particular server. Also, visibility with cron leaves a lot to be desired. Unless you are willing to write more code around cron, you won’t get results, failures, etc. With more complex infrastructures being built every day, you need that visibility when one piece of the puzzle breaks down. It s easy to see why Dan wants something more useful. While Chronos handles distributed job scheduling, it still requires a lot of work. In order to create any complex tasks, you’ll be writing a fair bit of code and manually piecing some things together. Still, Chronos is a significant step-up over manual scripts or cron.
How JumpCloud Automates Device Management with a Cloud-based Directory
Building task workflows is critical in today’s cloud environments. Companies are building complex infrastructures and then trying to automate the management and execution of tasks on them. JumpCloud is trying to solve this problem with its Identity-as-a-Service platform. As a cloud directory service, we want to solve three core problems: authentication, authorization, and device management. While the focus of this topic is device management, it just as easily could be called server orchestration, job scheduling, or task and workflow automation, among others. In short, it’s the ability to execute a series of tasks on a server or other hardware device.
Automate Workflow with Distributed Job Scheduling Systems
JumpCloud lets you easily build a complex workflow of tasks. Tasks become building blocks which are executable on different groups of servers, laptops, desktops. Whether you call it distributed job scheduling, server orchestration, or something else, JumpCloud’s device management functionality is powerful and can help you easily automate workflow. Best of all, you won’t be writing loads of code. You’ll be able to quickly execute your tasks, saving tremendous time and headache.
Give JumpCloud a try if you are looking to schedule tasks all across your infrastructure. Or, drop us a line to request a JumpCloud demo. You’ll be glad you did.
Is there some service that Amazon (AWS) offers that can run a reoccurring job at scheduled intervals?
This is one of a few single points of failure that people (including me) keep mentioning when designing architectures with AWS. Until Amazon solves it with a service, here's a hack I've published which is actively used by some companies.
AWS Auto Scaling can run and terminate instances using a recurring schedule specified in the cron format.
You can have the instance automatically run a process on startup.
If you don't know how long the job will last, you can set things up so that your job terminates the instance when it has completed.
Here's an article I wrote that walks through exact commands needed to set this up:
Running EC2 Instances on a Recurring Schedule with Auto Scaling
Starting a whole instance just to kick off a set of jobs seems a bit like overkill, but if it's a t1.micro, then it only costs a couple pennies.
That t1.micro doesn't have to do the actual work either. Your instance could inject messages into SQS or through SNS so that the other redundant servers pick up the tasks.
answered Jul 23 '12 at 18:52
21.4k55 gold badges6262 silver badges7272 bronze badges
Running cron jobs in the cloud - Amazon EC2 vs AWS Lambda
Cron jobs are one of the things that have gotten harder, not easier, when moving to the cloud.
The motivation to automate recurring tasks is still strong in the software community, but while companies have been transitioning their infrastructure towards cloud environments, they’ve been falling behind on tooling for daily tasks. Previously, when companies hosted servers in their own data centers, scheduling a cron job to run on a spare machine was a 15-minute task. But with the move to the cloud, there are no longer any spare machines. Companies track infrastructure closely because the management of this infrastructure is now done automatically, and access to it is restricted, creating new barriers to automation in cloud environments.
The first solutions to general automation in the cloud ran on Amazon EC2: companies would spin up a machine and use it for cron jobs, or they’d install a layer of middleware on top of EC2, such as Sidekiq. These solutions were unsustainable due to overspending on idle machines. Running cron jobs using Sidekiq and similar scheduling systems also meant that the software engineering teams had to maintain an application layer for scheduling the jobs, and this resulted in unnecessarily tight coupling of cron jobs to the business logic of the given applications.
AWS Lambda is taking its place as the new standard for task automation in AWS environments. When used with the Serverless framework, AWS Lambda allows you to combine a great developer experience with the advantage of only paying for what you use, saving on compute costs. Of course, Lambda has its limitations, but in a large proportion of cases it can be a solution for recurring tasks and cron jobs in the cloud that is easier to develop for, more secure, and more observable than EC2.
In this article, we’ll compare Amazon EC2 and AWS Lambda for running cron jobs and offer guidance for when to choose which of the two.
Amazon EC2 vs. AWS Lambda for running cron jobs
Cost and resource utilization Under EC2, you must reserve an entire machine for your cron jobs at all times. Unless you have a very high and consistent number of cron jobs that you run, you’re likely underutilizing your EC2 machine.
With Lambda, AWS schedules your job once created and only charges you for the amount of time the job spends running. You pay for only what you use, and your costs are proportional to the number of cron jobs you run.
This pricing model for AWS Lambda can be both a positive and a negative. If you run a small number of cron jobs, fewer than would use up an entire EC2 machine, you’ll pay less overall using Lambda. But if you run many scheduled tasks, or if your tasks have long execution times, the AWS Lambda charges may be higher than the equivalent computing capacity on EC2. In this case it would be more economical to choose EC2, perhaps especially so when using EC2 reserved instances.
Software available for cron jobs and machine maintenance Here are some of the regular maintenance tasks you’ll need to perform on any EC2 machines you use for cron jobs:
- Update the operating system.
- Install security updates.
- Clean up outdated temporary files.
- Reboot the machine whenever AWS needs to migrate it to a newer infrastructure.
In contrast, AWS Lambda is a fully managed service, so all these tasks are taken care of by AWS. You don’t need to spend any time on them when using Lambda. But with Lambda you give up any flexibility around pre-installed software, operating system versions and available programming language runtimes.
Deployment To ensure that your cron jobs deliver maximum value, they must be easy to update and iterate on. The deployment process forms a key part of the iteration cycle. Repeatable, traceable deploys allow you to add value faster and with more confidence. Just as in the cases of microservices, web applications and legacy software, you need a robust deployment process for your cron jobs as well, and this is very much attainable when running on either Amazon EC2 or AWS Lambda.
On AWS Lambda, each function has a version identifier associated with it. Any change to the code creates a new version of the function—and this is the core of the deployment process. When using the Serverless Framework, running not only creates a new version of the function but makes all necessary changes to your AWS infrastructure to deploy a new version of your Lambda cron job. You can run the deployment manually as you see fit or, if you useGit Flow for your cron jobs, you can also run the deployment automatically via your CI environment whenever there is a merge to the default branch. This way you have a convenient and flexible deployment process for your cron job.
When using EC2 for your cron jobs, a consistent deployment process requires more work. You most likely don’t want your team members to have direct access to the production environment, so you’ll need a way to update the cron jobs on the EC2 machines remotely and automatically. One solution would be to use a configuration management system (like Chef Infra) to keep track of and update the cron jobs on your EC2 servers whenever developers make changes to the cron jobs’ code. Another option might be to create a versioned Docker container with your cron jobs and then set the EC2 machines to regularly pull the latest version of the container.
In short, using EC2 machines for your cron jobs means you need to build the deployment automation yourself, while with AWS Lambda you get it out of the box.
Secrets management Your cron jobs very likely need to connect to your backend systems, which means you’ll need to make sensitive credentials available to the cron job when it runs. As it happens, many teams who pay plenty of attention to handling secrets for their microservices and applications lack a secrets management strategy for their cron jobs. In an ideal world, you’d be able to grant your developers all the flexibility they need to iterate on and test the cron jobs while creating zero security risks.
When running cron jobs on Amazon EC2, you can, for example, use a secrets store like Vault. With Vault, your cron jobs can dynamically get the credentials they need. The secrets don’t get stored on the machine that’s running the cron jobs, and if you change a secret, the cron jobs will automatically receive that change. The downside of implementing a solution like Vault, however, is the overhead of managing the secrets store. You’ll need to set up the store itself, maintain the underlying server and see to getting the credentials from the store and into your cron job.
With AWS Lambda, you can use a number of off-the-shelf services to handle secrets management. You can choose between AWS SSM and AWS Secrets Manager, or you can use the Serverless Framework’s secrets management functionality to take care of secrets without additional operational overhead. Check out our article on secrets management for AWS Lambda for a comparison of these three options.
Overall, AWS Lambda has more options for secrets management that require less configuration and maintenance.
Metrics and alerts When a cron job breaks, developers generally don’t notice until it overloads another system or goes out of service. To prevent service disruptions from cron jobs not running or running incorrectly, it can be very helpful to set up a reliable metrics feed and alerts based on those metrics. The metrics and alerts make you aware of problems so that you can resolve them before they have any downstream effects on your infrastructure.
Both EC2 and AWS Lambda allow you to export metrics to CloudWatch, and setting up alerts on those metrics is straightforward. On AWS Lambda, emitted CloudWatch events are generally tied to function executions and run times. CloudWatch’s default metrics may not be right for monitoring infrequently running functions (like cron jobs), so you may need to adjust the metrics coming from your cron-job Lambda functions. With EC2, however, the default CloudWatch metrics only monitor the machine itself, such the load average and the amount of memory used, offering essentially no visibility into the cron jobs running on the machine. If you use EC2 for cron jobs, you will certainly need to create and submit your own metrics to CloudWatch (or other metrics systems).
AWS Lambda has a built-in metrics system that’s more geared toward short-lived tasks like cron jobs. However, using CloudWatch can get expensive fast, and configuring the right alerts can be challenging for jobs that don’t run often. To address this, the Serverless Framework provides pre-configured alerts that kick in when there is an unusual level of activity in your function, or when it generates a new exception.
Independent of the metrics system you choose, getting visibility into how your cron jobs run and where they might have issues greatly reduces the risk of a job silently failing and impacting downstream services and infrastructure.
EC2 vs Lambda: which one should you use for cron jobs?
We’ve covered all the ways in which AWS Lambda and EC2 differ in running cron jobs. Both of these services are cloud-native ways to automate tasks in your infrastructure.
Deciding which one is the right choice for your company and your team depends on your particular use case, and whether it fits well with what AWS Lambda can do. If AWS Lambda can run your cron jobs without problems, it is very likely to be a more cost-effective and more easily manageable solution. And if you use Serverless Framework with AWS Lambda, you also get an out-of-the-box solution for secrets management, a number of built-in alerts and metrics and a great developer experience.
There’s definitely still a place for EC2 in running cron jobs when the tasks have specific requirements that Lambda can’t support, such as long-running jobs, jobs that require access to special resources like GPUs, or jobs that are written in runtimes not supported by Lambda. For these use cases, you’ll need to create your own solutions in concert with other AWS services for deployment, secrets management and alerting.
Links and references
Subscribe to Our Newsletter Packed with Tips, Guide and Good Development Inspiration
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
You will also be interested:
- Rutgers cnavas
- Crafty cups boutique
- Nail designs 2017
- Gas freestanding heaters
- Kickboxer white warrior
- Monster high cd
- Terraform state push
Already sure I'll agree. My business is to invite. Look for yourself there. However, there is a certain man for whom an exception is made in the strict rules of the capitalist community.