Welcome!

PHP Authors: Liz McMillan, Carmen Gonzalez, Hovhannes Avoyan, Lori MacVittie, Trevor Parsons

Related Topics: Ruby-On-Rails, Java IoT, Industrial IoT, Microservices Expo, Open Source Cloud, Machine Learning , PHP

Ruby-On-Rails: Article

Monitoring Background Jobs in Ruby’s Resque

How to get visibility into an important component of any complex system: the messaging queue

Here at AppNeta, we get to see a lot about how people build their web applications. From simple PHP scripts to heavily service-oriented Java clouds to monolithic Django apps, everybody’s product is architected a little differently. We’re still out to trace everything, and today I want to talk how to get visibility into an important component of any complex system: the messaging queue. Specifically, let’s look at how to trace a job from Rails using Resque.

Messaging Queues
If you haven’t used a messaging queue in your app, the idea is simple. Instead of forcing all the work to happen during the request, while the user is waiting, you can delay some of the more time-consuming tasks. You can do anything in these tasks, ranging from a simple insert to kicking off a series of user analytics that touch all parts of your infrastructure. The advantage is that you can return a speedy response to the user, or, if they are actually waiting on the task results, give them a better loading interface than a white screen and browser loading bar.

A Quick Resque Tutorial
In Ruby, Resque is a task runner, which by default stores the task descriptions in Redis (though other options are available). Resque jobs are just Ruby classes, with a single mandatory method perform. Resque will call perform with the arguments given in the task description. Let’s look at a minimal task, that takes a single argument and prints it. (Useless, I know.)

LogInfo

The @queue variable defines a name that a worker can bind to, in case you want to spread different types of jobs across different machines. To create a task that this worker could run, we just call it from our request:

HoorayTrace

And that’s our job! Maybe not the most interesting job, and probably not prone to performance issues, but we don’t know that yet. So let’s measure it!

Tracing a Resque Task
Now that we’ve added this to our system, we should have monitoring around it. The easiest way to do this would be to just measure the time each task takes, and log that information:

MeasureLog

Unfortunately, the data presentation here leaves a bit to be desired, so I’m going to use TraceView to log this information instead. This also has the benefit of logging any SQL queries, cache accesses, or service calls that we might do in a more complex task, as well as reporting errors. To start a trace fresh, we can wrap this call in the start_trace block:

That’s a start! We’ve now got some visibility into our Resque jobs, and we can rest easy knowing that this is running smoothly in production.

Tracing a Resque Job (with multiple tasks!)
For cron-style jobs, the approach of tracing each task individually works fine. For reference, let’s look at the events we’re generating with that code:

Tracing Resque

Pretty straightforward. Now let’s consider a more complicated set of tasks: a document-processing pipeline. That code might look like this:

ProcessPipeline

In this case, our first task takes a document, and the second one archives it. If we have multiple tasks, each one gets logged separately, and we can figure out same statistics for each — average, std. dev., percentiles, and the like. But what if you have a job that spans multiple tasks? We can further aggregate the stats, but we might be starting to miss things, like large inputs that cause the entire pipeline to slow down.

What we’d really like is to correlate the related tasks, so instead of timing the each task, we’re timing the entire job. Under the hood, TraceView generates a token for each request. If we pass this ID (generally stored in xtrace, after the X-Trace header it’s passed around in) to each task, we can correlate those timings before storing them, and retrieve them all together. To do this, we can modify each task to take this token, and trace using that ID. ProcessDoc then becomes:

ProcessDoc

Now we need to start the trace somewhere, but we’re not doing it in the job. We could start it in the first task, or we could link this one step further up the chain and tie it back to the web request that started it in the first place. In a default rails stack, that request generates the following events:

Tracing Resque

To add in the task queue call to the logged request, we can call the following function:

TaskQueue-LoggedRequest

We have to force a fork in the execution path to indicate that we’re running an asynchronous task, possibly in parallel, with the rest of the web request, which is done with the call to fromString. Aside from that, this is the same underlying call as is done by start_trace above — log that we’re entering a named block of code, and start timing it.

When we put it all together, we get a secondary execution path attached to the web request, and the logged events look like this:

Tracing Resque

Now we’ve got everything: the original request, all tasks, individual timing information, and a global view of how the process performed. Not that we now have an additional timing measurement here: the delay before starting the task at all. In this case, we waited a full 500ms between queuing the job actually executing it! Once we were in the pipeline, the tasks happened much faster (only 25ms between processing and archiving).

Caveats
Lest you think that everything was easy, there’s a couple things to keep in mind when you use this in your own application.

  • Because we’re starting the timing in the web request and ending it in a task queue, we’re relying on those two processes to have an identical clock. If they’re on the same machine, it won’t be a problem, but on different machines, any clock skew will effect the timing.
  • I’ve quietly assumed everything in this system is reliable, which is almost certainly wrong. Whatever your error handling is, make sure you always log the exit event for ‘job’, or you may never know that you have errors!

As long as I haven’t totally dissuaded you from trying this out, all the code is available in one place in this gist, and you can try in out in your application today with our free version of TraceView!

Related Articles

Ruby 2.0 Released: Let The Tracing Begin!

AppNeta Rubygems Verified

Relieve Event Binding Aches in Backbone.js

More Stories By TR Jordan

A veteran of MIT’s Lincoln Labs, TR is a reformed physicist and full-stack hacker – for some limited definition of full stack. After a few years as Software Development Lead with Thermopylae Science and Techology, he left to join Tracelytics as its first engineer. Following Tracelytics merger with AppNeta, TR was tapped to run all of its developer and market evangelism efforts. TR still harbors a not-so-secret love for Matlab-esque graphs and half-baked statistics, as well as elegant and highly-performant code. Read more of his articles at www.appneta.com/blog or visit www.appneta.com.

IoT & Smart Cities Stories
OpsRamp is an enterprise IT operation platform provided by US-based OpsRamp, Inc. It provides SaaS services through support for increasingly complex cloud and hybrid computing environments from system operation to service management. The OpsRamp platform is a SaaS-based, multi-tenant solution that enables enterprise IT organizations and cloud service providers like JBS the flexibility and control they need to manage and monitor today's hybrid, multi-cloud infrastructure, applications, and wor...
The Master of Science in Artificial Intelligence (MSAI) provides a comprehensive framework of theory and practice in the emerging field of AI. The program delivers the foundational knowledge needed to explore both key contextual areas and complex technical applications of AI systems. Curriculum incorporates elements of data science, robotics, and machine learning-enabling you to pursue a holistic and interdisciplinary course of study while preparing for a position in AI research, operations, ...
Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
Tapping into blockchain revolution early enough translates into a substantial business competitiveness advantage. Codete comprehensively develops custom, blockchain-based business solutions, founded on the most advanced cryptographic innovations, and striking a balance point between complexity of the technologies used in quickly-changing stack building, business impact, and cost-effectiveness. Codete researches and provides business consultancy in the field of single most thrilling innovative te...
Atmosera delivers modern cloud services that maximize the advantages of cloud-based infrastructures. Offering private, hybrid, and public cloud solutions, Atmosera works closely with customers to engineer, deploy, and operate cloud architectures with advanced services that deliver strategic business outcomes. Atmosera's expertise simplifies the process of cloud transformation and our 20+ years of experience managing complex IT environments provides our customers with the confidence and trust tha...
Darktrace is the world's leading AI company for cyber security. Created by mathematicians from the University of Cambridge, Darktrace's Enterprise Immune System is the first non-consumer application of machine learning to work at scale, across all network types, from physical, virtualized, and cloud, through to IoT and industrial control systems. Installed as a self-configuring cyber defense platform, Darktrace continuously learns what is ‘normal' for all devices and users, updating its understa...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
CloudEXPO has been the M&A capital for Cloud companies for more than a decade with memorable acquisition news stories which came out of CloudEXPO expo floor. DevOpsSUMMIT New York faculty member Greg Bledsoe shared his views on IBM's Red Hat acquisition live from NASDAQ floor. Acquisition news was announced during CloudEXPO New York which took place November 12-13, 2019 in New York City.
With the introduction of IoT and Smart Living in every aspect of our lives, one question has become relevant: What are the security implications? To answer this, first we have to look and explore the security models of the technologies that IoT is founded upon. In his session at @ThingsExpo, Nevi Kaja, a Research Engineer at Ford Motor Company, discussed some of the security challenges of the IoT infrastructure and related how these aspects impact Smart Living. The material was delivered interac...
Intel is an American multinational corporation and technology company headquartered in Santa Clara, California, in the Silicon Valley. It is the world's second largest and second highest valued semiconductor chip maker based on revenue after being overtaken by Samsung, and is the inventor of the x86 series of microprocessors, the processors found in most personal computers (PCs). Intel supplies processors for computer system manufacturers such as Apple, Lenovo, HP, and Dell. Intel also manufactu...