blogs
Unlocking the power of Node.js: Multi-Threading with Worker Threads.
In recent Node.js versions, starting with v10.5.0 and further solidified in Node.js v12 LTS, the introduction of the “worker_threads” module has transformed the way we approach concurrency in Node.js. This module brings multithreading capabilities to the platform, enabling developers to perform CPU-intensive tasks more efficiently. In this article, we’ll explore the significance of worker threads, their historical context, and why they are a valuable addition to the Node.js ecosystem.
The Evolution of Concurrency in JavaScript and Node.js
JavaScript, originally designed as a single-threaded language, initially served the purpose of enhancing interactivity on web pages. It functioned well in this role, as there was limited need for complex multithreading capabilities in this context. However, Ryan Dahl saw this limitation as an opportunity when he created Node.js. He wanted to implement a server-side platform based on asynchronous I/O to avoid a need for threads and make things a lot easier.
But concurrency can be a very hard problem to solve. Having many threads accessing the same memory can produce race conditions that are very hard to reproduce and fix.
Is Node.js single-threaded?
Our Node.js applications are only sort of single-threaded, in reality. We can run things in parallel, but we don’t create threads or sync them. The virtual machine and the operating system run the I/O in parallel for us, and when it’s time to send data back to our JavaScript code, it’s the JavaScript that runs in a single thread.
In other words, everything runs in parallel except for our JavaScript code. Synchronous blocks of JavaScript code are always run one at a time :
let flag = false function doSomething() { flag = true // More code (that doesn't change `flag`)... // We can be sure that `flag` here is true. // There's no way another code block could have changed // `flag` since this block is synchronous. }
This is great if all we do is asynchronous I/O. Our code consists of small portions of synchronous blocks that run fast and pass data to files and streams, so our JavaScript code is so fast that it doesn’t block the execution of other pieces of JavaScript.
A lot more time is spent waiting for I/O events to happen than JavaScript code being executed. Let’s see this with a quick example :
db.findOne('SELECT ... LIMIT 1', function(err, result) { if (err) return console.error(err) console.log(result) }) console.log('Running query') setTimeout(function() { console.log('Hey there') }, 1000)
Maybe this database query takes a minute, but the “Running query” message will be shown immediately after invoking the query. And we will see the “Hey there” message a second after invoking the query regardless of whether the query is still running or not.
Our Node.js application just invokes the function and does not block the execution of other pieces of code. It will get notified through the callback when the query is done, and we will receive the result.
Challenges of CPU-Intensive Tasks
However, the single-threaded nature of JavaScript poses challenges when it comes to CPU-bound tasks that require intensive computations on large datasets. In such cases, synchronous code execution can become a bottleneck, leading to sluggish performance and blocking of other critical tasks.
The Quest for Multithreading in JavaScript
The natural question that arises is whether we can introduce multithreading into JavaScript to tackle CPU-bound tasks more effectively. Unfortunately, it’s not a straightforward endeavor. Adding multithreading to JavaScript would require fundamental changes to the language itself. Languages that support multithreading typically have specialized constructs, such as synchronization keywords, to enable threads to cooperate seamlessly.
The Simple Yet Imperfect Solution : Synchronous Code Splitting
To address the challenges posed by CPU-bound tasks, developers have resorted to a technique known as synchronous code splitting. This approach involves breaking down complex tasks into smaller synchronous code blocks and using “setImmediate(callback)” to allow other pending tasks to be processed in between. While it’s a viable solution in some cases, it has limitations and can complicate code structure, especially for more intricate algorithms.
// Example of using setImmediate() for code splitting const crypto = require('crypto') const arr = new Array(200).fill('something') function processChunk() { if (arr.length === 0) { // Code that runs after the whole array is executed } else { console.log('Processing chunk'); const subarr = arr.splice(0, 10) for (const item of subarr) { doHeavyStuff(item) } setImmediate(processChunk) } } processChunk()
Running Parallel Processes Without Threads
Fortunately, there’s an alternative approach to parallel processing that doesn’t rely on traditional threads. By leveraging modules like “worker-farm,” developers can achieve parallelism by forking processes and managing task distribution effectively. This method enables the main application to communicate with child processes through event-based message passing, avoiding shared memory issues and race conditions.
// Example of running parallel processes using worker-farm const workerFarm = require('worker-farm') const service = workerFarm(require.resolve('./script')) service('hello', function (err, output) { console.log(output) })
Introducing Worker Threads
The introduction of worker threads in Node.js provides an elegant and efficient solution to the challenges posed by CPU-bound tasks. Worker threads offer isolated contexts, each with its own JavaScript environment. They communicate with the main process through message passing, ensuring that there are no race conditions or shared memory concerns.
As the documentation says,
“Workers are useful for performing CPU-intensive JavaScript operations; do not use them for I/O, since Node.js’s built-in mechanisms for performing operations asynchronously already treat it more efficiently than Worker threads can.”
worker_threads are more lightweight than the parallelism you can get using child_process or cluster. Additionally, worker_threads can share memory efficiently.
// Example of using Worker Threads const { Worker } = require('worker_threads') function runService(workerData) { return new Promise((resolve, reject) => { const worker = new Worker('./service.js', { workerData }) worker.on('message', resolve) worker.on('error', reject) worker.on('exit', (code) => { if (code !== 0) reject(new Error(`Worker stopped with exit code ${code}`)) }) }) } async function run() { const result = await runService('world') console.log(result) } run().catch(err => console.error(err))
Using Worker Threads for Multiple Tasks
Starting from Node.js v10.5.0, developers can readily harness the power of worker threads. For versions prior to Node.js 11.7.0, enabling worker threads requires the “–experimental-worker” flag. Developers can create a pool of worker threads to optimize resource utilization, ensuring that multiple tasks can be executed in parallel while conserving memory.
Let us say we are building an app that allows users to upload a profile picture, and then you generate multiple sizes from the original image (e.g.: 50 x 50 or 100 x 100) for the different use cases within the app.
The procedure of resizing the image is CPU intensive, and having to resize it into different sizes would block the main thread. This task of resizing the image can be given to the worker thread, while the main thread handles other weightless tasks.
Worker Threads in js are Useful in these Cases:
- Search algorithms.
- Sorting a large amount of data.
- Video Compression.
- Image Resizing.
- Factorization of Large Numbers.
- Generating primes in a given range.
Conclusion
With the stable release of the “worker_threads” module in Node.js v12 LTS, developers have gained a robust solution for handling CPU-intensive tasks in Node.js applications. Whether you’re optimizing server-side performance or enhancing the user experience in web applications, worker threads offer a powerful tool for achieving parallelism without the complexities of traditional multithreading.