Categories

Archives

Is It An Issue To Schedule Too Many Tiny Threads?

A few days ago somebody asked me this question on Quora. The answer was simple and short, but I thought it might be useful to others, so I am publishing it also here, with some additional information.

First of all, let’s provide a definition of thread. The short answer is that a thread is a piece of a process, or a program, that is capable of running independently from the rest of the application. It is basically a section of an application that can run in parallel with the rest of the application. But, does it really run in parallel? Well, on a single CPU that has only one core, the answer is: no. But, on a CPU with multiple cores, the answer is: sometimes, maybe often. The issue here is that, for a thread to actually run in parallel with other programs or with other threads of the same program, it needs to have available resources that it can use: memory, CPU time, I/O (if needed).

The operating system, and in particular the scheduler, takes care of handling the execution of different thread and programs. When there are several threads to execute, it distributes them across the available cores and assigns to each one of them a certain amount of memory to use. Also, when the scheduler suspends a thread to make another one run in its place, so that all the threads get a fair share of run time, it might decide to temporarily swap to disk the memory used by the thread being suspended and extract from the swap file the memory that was saved when the thread that is about to be awaken was using when it was suspended. How the swap file is used depends on the operating system. Different operating systems use different algorithms. But all the operating systems use the swap file when they decide that the RAM is not sufficient to handle all the activities, so some sections of the RAM need to be stored on disk to be able to use that freed RAM for another process or thread.

Based on this information, it is now clear that, in a computer with limited resources, the more threads are being executed, the more memory is used, and so the more are the chances that the swap file is used. Also, if there are more threads than cores, the OS will need to alternate the threads execution to give to each one of them a fair share of execution time.

And with this background, here is the response I gave to the initial question.

How many is too many? It all depends on the number of cores and the amount of memory that is available. Each thread will take away a certain amount of memory from the system. Each thread will try to use a core.

If there are more threads than cores, it will become more difficult for the OS to assign cores to threads. With too many threads, the OS will start having real trouble assigning cores to the threads in a fair way. That’s because the OS will have to run so much to keep switching the threads that it will end up being the one program that uses the most of the CPU. Things may slow down so much that the system could come to a halt.

If there are enough threads to exhaust the memory, the system will start swapping to disk and it will become slower. If there are too many threads, even the swapping space will be exhausted and the system will crash.

Past and Future Improvements In Personal Computing Architecture

A few weeks ago, I was reading some questions on Quora regarding the future of Home Computing. I gave some thoughts on the topic and provided some of the answers.

I would like now to extend my ideas to you, my readers, putting together all that I read and wrote in a nice essay. Please, keep in mind that these are my own thoughts, and I was not influenced in any way in writing them. Also, being my own thoughts, they are not necessarily in line with the current technology path. Maybe one day. Who knows…

Enjoy!

CPU architectures have started evolving in different directions than just increasing the number of transistors on the die to add new and powerful features. There are two reasons for that:

  1. They reached already some sort of physical limitation, in terms of minimum size of the transistors, which are quickly reaching an order of magnitude too close to the one of the atoms themselves for making transistors still work.
  2. The clock frequency being used limits the maximum distance between components to avoid incurring in huge delay in signal transportation that the CPU would not be able to handle without malfunctioning.

Matt Britt at the English language Wikipedia, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=5820383

The first reason makes it almost impossible to keep shrinking the size of the transistors on the die and, therefore, their quantity on the limited space of the chip.

The second reason prevents the CPU makers to increase the clock frequency any further without having to face major cost increases in the fabrication process. Of course, nobody wants to spend thousand of dollars on a single CPU, so they have already stopped, since a while, the race to increase the clock speed.

A common solutions to the two problems is already in place since a number of years now: rather than increasing the power of a single CPU and its clock speed, they decided to increase the number of less powerful CPUs on the same die. Or, better, I should say, they decided to duplicate the core functionality of the CPU, moving toward an architecture that, not without cause, has been named multi-core.

So, now, rather than executing more instructions per second by increasing the clock speed, they execute at a lower speed several instructions in parallel, one in each core, at least whenever it is possible. The result: an increase in performance without increasing the clock speed.

However, there is a catch! Several actually.

Increasing the clock speed allowed to increase proportionally the execution speed of the programs. But doubling the cores from one to two, does not double the execution speed. And that is because it is not always possible to execute two instructions in parallel on two different cores. Sometimes, you need to wait for the result of the previous instruction, before you can use it as an input to the next. Same goes when you increase the number of cores even more, to 4 or 6 or 8, and so forth.

Using multiple cores simultaneously becomes, therefore, more convenient for running different programs simultaneously, one per core. However, often a program running on a core needs to access the same external resources as another program on another core. And now, the contention needs to be solved. One of the two programs needs to wait for the other one. Ouch! Again, using the multiple cores this way does not multiply the efficiency of the CPU, because of these waiting times.

And so, here it comes yet another solution: hyper-threading. Why don’t assign two threads of two different programs to the same CPU core? So, when one thread needs to wait for an I/O operation, for example, the other thread can use the core to do something else that does not need the same resource.

Nice, we got some more efficiency now, so an overall faster CPU. But, wait, what if the two programs on the two threads need to access the same resource at the same time? And here we are again, yet another bottleneck.

It seems like every time a new solution is put in place, it provides some enhancement to the previous status quo but, at one point, it ends up facing again the same obstacle, or maybe a new one. The fact is, that each and every of these enhancements, although it seems it could double the performance, it really does not, for one reason or another.

And, wait, there is more!

In order to really exploit the multicore architecture and the hyper-threading, programs need to be written in a new way. In the old days, you would write a program that executed in a structured way, making decisions whether to follow a logical path or another, along the way. New programs can better use the features of the new CPUs if they can be written in such a way that different operations are executed by different threads, so a program is, in reality, a collection of several smaller programs running in parallel.

And yes, we do that today, although not without difficulty. For example, since all the threads are still part of the same program, they sometimes need to access data that are generated or modified by another thread. And what happen if two or more threads need to access the same piece of data? How to figure out if a thread can read some data without taking the chance that it is reading garbage because, at the same time, another thread is modifying that data?

And here we go again: now the different threads of the same program need to compete for the same piece of data. So all the threads of the same program need to be coordinated, so to wait for one thread to finish with that piece of data before another thread can access it. And so new entities are introduced in programming languages, like the semaphore and the mutex. And so the programming languages have to evolve too, to keep the pace with the needs of the new generations of CPU.

And, of course, the race has not ended. Every day we need more and more processing power, to have something able to process faster video frames, or to allow people to edit videos more smoothly. Or we need something to play video games with an amount of details that is greater every day.

So, what else can be done? Here is when we enter into the uncharted territories, the world of speculations, of fantasies. Who really knows what technology will bring us the future?

Optical CPU? Or, maybe, we just need to speed up those peripherals that are still slow when compared to the speeds of today CPUs?

And how do we do that? Well, one way is to make faster memory chips, to decrease the memory read/write waiting time. Another thing is to improve hard disk technology to make them respond fast to CPU requests. We are already abandoning the spinning plates of the HDD and embrace more and more a technology that has no moving parts, the SSD.

But, still, SSDs don’t seem to be the final answer to the problem. It is true they can achieve much faster speed when reading, but the writing part, although much better than that of the HDDs, still is definitively slower than the reading speed. And there are several reasons for that, some depending on the technology currently used for SSDs, and some depending on algorithms that attempt to use the SSDs in such a way that they can last longer.

So, do we expect yet another new technology that will emerge and will replace the SSDs? Probably. And there is promising research nowadays to do so, using optical systems without moving parts, or other ways to dope the silicon in a 3D fashion, rather than using 2D topology.

Maybe the future will also give us some sort of memory device fast enough to be used as main computer memory and also able to retain information when powered down, so it can be used also as mass storage device. What would you think of a type of memory chip that works seamlessly as main memory and main storage?

We just need to wait and see what the future has in store for us.

Let’s open a discussion on this topic. What are your thoughts? Any comments?

What Is A Sysadmin?

From Wikipedia:

A system administrator, or sysadmin, is a person who is responsible for the upkeep, configuration, and reliable operation of computer systems; especially multi-user computers, such as servers. The system administrator seeks to ensure that the uptime, performance, resources, and security of the computers they manage meet the needs of the users, without exceeding a set budget when doing so.

So, really, who is it? Well, unless the company where the System Administrator works is very small, it is really not a single person, but a whole team. In this team, each member has different duties, among those listed in the definition and, possibly, more than one person is assigned to the same duty.

However, for the sake of simplicity, we will refer in this article to the System Administrator as the one person that does everything. You just remember who is really at work behind the scenes.

Did I say behind the scenes? Well, it really is like that. Normally, nobody cares about System Administrators until something goes wrong. Nobody knows the huge amount of work that System Administrators have to face in the day to day activities, just to make sure that everything goes on smoothly. Users only remember System Administrators when something goes wrong and they are really quick to blame them for that. If they only new that the majority of work was elsewhere, not just to fix the problems. We should all be very grateful to System Administrators and the work they do.

Disclaimer: although I do System Administration for my home network, I do not do and never did System Administration in any company I have worked with. But I am fully aware of what is going on behind the scenes because I am one of the guys that designs the elements of the IT infrastructure from the ground up, both in hardware and in software, although nowadays it is mostly software.

Daily Duties

So, what are the Daily Duties of the Sysadmin? Let’s make a list of them and, for each item, let’s provide some sort of job description and explanation. And note that the list is by no means exhaustive.

  • Hardware Installation And Maintenance. And here I am, talking about servers, routers, hubs, cables, and basically everything that makes up a whole local network. The system administrator must be able to deal with the hardware and has to know how to properly handle it and physically install it. He needs to know the procedures and protocols to replace defective parts like disks, whole network nodes, and similar. A lot of time is devoted to this activity, as network components do fail over time and need to be replaced or repaired. Also, duty of the sysadmin is to run diagnostics periodically, to figure out when a component is about to fail and fix it before it actually does not work anymore. And because these are disrupting activities that could prevent users from doing their job, often the sysadmin has to work outside office hours, sometimes even during the night, so that the next day users can continue to work without even noticing that some sort of maintenance was done in the network, and that the sysadmin did not sleep to allow them to work flawlessly.
  • Software Installations And Upgrades. Yes, because all that hardware does not work on its own but use software that needs to be installed and, eventually, upgraded from time to time. And again, this is often disruptive work in the network, especially if the unit that needs to be maintained is a business critical server. So, also this job needs to be done outside regular working hours.Moreover, the sysadmin has also the responsibility of making sure that the software works as expected. So, before doing a new installation or an upgrade, the sysadmin has to do the work on a spare system, possibly offline, to confirm that the installation procedure can go smoothly and to measure the amount of time required to do it, so the correct date, time and duration of the activity can be scheduled safely for the business. And of course, after testing the installation or upgrade procedure, the sysadmin needs also to test the functionality of the new software. So, sometimes, after the actual installation/upgrade, the sysadmin has to spend days running tests to make sure that everything will work fine once the operation is done on the live systems, so that the users will not even realize that something underneath was modified.
  • Logs Analysis. Daily check of all kind of system logs and application logs, to make sure that everything works fine and there are no issues. If something is found wrong in the logs, the sysadmin has to use again the test system and try to reproduce the problem on that system. And, once the problem is reproduced, he/she needs to find a solution to fix it. Sometimes it will be just a matter of fine tuning some configuration parameter. Some other time it is a software bug, so the sysadmin will need to involve the developers that wrote that piece of software, if in house, or the company that sold it, otherwise. And then, since the fix usually is not done immediately, but it needs a new software release or a patch that needs time to prepare, the sysadmin has to find the way to mitigate the problem, so users will be able to do their job while waiting for the fix to be ready, tested, and installed.
  • Users Management. People in a company come and go, or maybe just change job requirements. Because of that, they need to acquire access to certain servers rather than others. The sysadmin job is to make sure that updates to the users rights of access are done correctly, in accordance to the company policies and needs.
  • Security Management. I’m not talking just about virus detection and elimination. Sysadmins also need to prevent hackers or other malicious people to access the company network to either create havoc or to steal proprietary information. And so the sysadmin has to deal with antivirus systems, firewalls, email spam detection, Trojan horses, DoS attacks, and so forth. This is a never-ending activity as the bad guys out there keep trying new ways to get into the network, maybe using social engineering to gain access. So the sysadmin has to keep all the systems updated with the latest security patches, with the appropriate and ever-changing security software. The sysadmin also needs to inform the users of the techniques used by the attackers, so people within the company knows how to recognize when something that seems perfectly innocent is actually wrong.
  • Systems Performance Management. Obviously everybody wants to have an environment that is always ready and fast to satisfy all their needs. And maybe some of the heavier users use some applications that generate a lot of traffic in the network. Not to mention data streaming, and more, and more… It is here that a lot of the time of the sysadmin is spent. Monitoring the network to find the bottlenecks, to fine tune that particular router so that it can perform better under the load that the particular user is throwing at it. Not to mention multi-user database servers that always need to be fast in providing the required information, or store new huge amount of data, continuously generated by different applications. And the need to make the network work also to handle IP phones, not to mention video conferences and the like. The sysadmin needs to continuously monitor the performance parameters of the network, recognize patterns that could lead to network disruptions or slow down over time, pinpoint malfunctions and fix them before they become showstoppers.
  • Mass Storage Management. And what about the storage of all that massive information in databases and in user data? How full are the file-systems? Which one needs to be expanded adding new hardware? Which one is underutilized, so some of the resources could be moved to other areas where they are more needed, without adding to the costs of maintaining the network?
  • Backup/Restore. Not to mention that all data is crucial to the welfare of the company. Data cannot be lost. System failures that may cause data loss need to be addressed and, eventually, data needs to be recovered. And so the sysadmin has to work to design the best backup protocol that satisfies the need of the company, and then he needs to implement it. The sysadmin needs to add disks to the network, create raid configurations to provide the first barrier to data loss causes. And then needs to store offline the backups of all that data, and also, possibly, off-site, to prevent a critical failure that destroys also the backup data stored on site to cause a permanent loss of that data. And then there are the users mistakes; people that inadvertently deletes their own files and then the ask the sysadmin to recover them and restore them. How many times that happened to you?
  • Documentation. Writing documentation is another important job of the sysadmin, whom needs to keep records of everything been done so, in case of failures, it is easy to go back to see if anything was done wrong and fix it. Or maybe, just because if he/she is not available, somebody else can read the documentation and understand how that thing that is not working needs to be fixed.
  • Servers Room Management. And did I say anything about the room that contains all the equipment? All the servers? Do you have any idea about how important is to keep everything tidy and clean to avoid that the dust infiltrates in moving parts and cause damage? Or just deposit on electronic circuits and cause short circuits or just overheating the systems? And talking about heat, do you know how much heat those serves produce? Machine Rooms are usually served by big air conditioners that refrigerate the room even during those cold winters, to prevent the systems from melting under the excessive heat. And here is the sysadmin, that more often than not is also asked to provide a design for the cooling system, and then he also needs to maintain it, or at least monitor it for possible issues.
  • Users Assistance. Finally, there are those unpredictable users. There is that one that does not have that particular application available on his computer and needs help to install it. Or the other one that prefer to go BYOD, with all the consequent dangers of viruses and Trojan horses bypassing the company firewalls and creating havoc. And what about doing the software update of all the company laptops and desktops whenever a new security patch becomes available?

Skills

Of course, all of the above requires skills. The sysadmin needs to know how to perform all those duties. He also needs to be informed, to be innovative. So the sysadmin needs to take classes, to read technical articles, to stay always a step ahead of what the needs of the company are, so he can propose changes to address specific issues in the work environment, for the benefit of the users, or just to improve their condition of work whenever new technologies that can help become available.

Did you ever give any thought about all of this? Think about it, before next time you go to the sysadmin and accuse him or her of not doing their job because your laptop seems too slow when doing that particular task.

Comments please, and have a nice, safe and productive day, provided by your unrecognized System Administrator! 😉