Friday, December 7, 2007

Defining Threads

Threading Defined

By the end of this section, you will understand the following:

  • What multitasking is and what the different types of multitasking are

  • What a process is

  • What a thread is

  • What a primary thread is

  • What a secondary thread is

Multitasking

As you probably know, the term multitasking refers to an operating system's ability to run more than one application at a time. For instance, while this chapter is being written, Microsoft Outlook is open as well as two Microsoft Word windows, with the system tray showing further applications running in the background. When clicking back and forth between applications, it would appear that all of them are executing at the same time. The word "application" is a little vague here, though; what we really are referring to are processes. We will define the word "process" a little more clearly later in this chapter.

Classically speaking, multitasking actually exists in two different flavors. These days Windows uses only one style in threading, which we will discuss at length in this book. However, we will also look at the previous type of multitasking so we can understand the differences and advantages of the current method.

In earlier versions of Windows - such as Windows 3.x - and in some other operating systems, a program is allowed to execute until it cooperates by releasing its use of the processor to the other applications that are running. Because it is up to the application to cooperate with all other running programs, this type of multitasking is called cooperative multitasking. The downside to this type of multitasking is that if one program does not release execution, the other applications will be locked up. What is actually happening is that the running application hangs and the other applications are waiting in line. This is quite like a line at a bank. A teller takes one customer at a time. The customer more than likely will not move from the teller window until all their transactions are complete. Once finished, the teller can take the next person in line. It doesn't really matter how much time each person is going to spend at the window. Even if one person only wants to deposit a check, they must wait until the person in front of them who has five transactions has finished.

Thankfully, we shouldn't encounter this problem with current versions of Windows (2000 and XP) as the method of multitasking used is very different. An application is now allowed to execute for a short period before it is involuntarily interrupted by the operating system and another application is allowed to execute. This interrupted style of multitasking is called pre-emptive multitasking. Pre-emption is simply defined as interrupting an application to allow another application to execute. It's important to note that an application may not have finished its task, but the operating system is going to allow another application to have its time on the processor. The bank teller example above does not fit here. In the real world, this would be like the bank teller pausing one customer in the middle of their transaction to allow another customer to start working on their business. This doesn't mean that the next customer would finish their transaction either. The teller could continue to interrupt one customer after another - eventually resuming with the first customer. This is very much like how the human brain deals with social interaction and various other tasks. While pre-emption solves the problem of the processor becoming locked, it does have its own share of problems as well. As you know, some applications may share resources such as database connections and files. What happens if two applications are accessing the same resource at the same time? One program may change the data, then be interrupted, allowing another program to again change the data. Now two applications have changed the same data. Both applications assumed that they had exclusive access to the data.

Processes

When an application is launched, memory and any other resource for that application are allocated. The physical separation of this memory and resources is called a process. Of course, the application may launch more than one process. It's important to note that the words "application" and "process" are not synonymous. The memory allocated to the process is isolated from that of other processes and only that process is allowed to access it.

In Windows, you can see the currently running processes by accessing the Windows Task Manager. Right-clicking in an empty space in the taskbar and selecting Task Manager will load it up, and it will contain three tabs: Applications, Processes, and Performance. The Processes tab shows the name of the process, the process ID (PID), CPU usage, the processor time used by the process so far, and the amount of memory it is using. Applications and the processes appear on separate tabs, for a good reason. Applications may have one or more processes involved. Each process has its own separation of data, execution code, and system resources

Threads

You will also notice that the Task Manager has summary information about process CPU utilization. This is because the process also has an execution sequence that is used by the computer's processor. This execution sequence is known as a thread. This thread is defined by the registers in use on the CPU, the stack used by the thread, and a container that keeps track of the thread's current state. The container mentioned in the last sentence is known as Thread Local Storage. The concepts of registers and stacks should be familiar to any of you used to dealing with low-level issues like memory allocation; however, all you need to know here is that a stack in the .NET Framework is an area of memory that can be used for fast access and either stores value types, or pointers to objects, method arguments, and other data that is local to each method call.

Single-Threaded Processes

As noted above, each process has at least one of these sequential execution orders, or threads. Creating a process includes starting the process running at a point in the instructions. This initial thread is known as the primary or main thread. The thread's actual execution sequence is determined by what you code in your application's methods. For instance, in a simple .NET Windows Forms application, the primary thread is started in the static Main () method placed in your project. It begins with a call to Application.Run().

Multithreaded Processes

As you probably already know, we can split up our process to share the time slice allotted to it. This happens by spawning additional threads of execution within the process. You may spawn an additional thread in order to do some background work, such as accessing a network or querying a database. Because these secondary threads are usually created to do some work, they are commonly known as worker threads. These threads share the process's memory space that is isolated from all the other processes on the system. The concept of spawning new threads within the same process is known as free threading.

The concept of free threading gives a significant advantage over the apartment-threading model - the threading model used in Visual Basic 6.0. With apartment threading, each process was granted its own copy of the global data needed to execute. Each thread spawned was spawned within its own process, so that threads could not share data in the process's memory. Let's look at these models side by side for comparison.

Thread Support in .NET and C#

Free threading is supported in the .NET Framework and is therefore available in all .NET languages, including C# and VB.NET. we will look at how that support is provided and more of how threading is done as opposed to what it is. We will also cover some of the additional support provided to help further separate processes

By the end of this section, you will understand:

  • What the System.AppDomain class is and what it can do for you

  • How the .NET runtime monitors threads

System.AppDomain

When we explained processes earlier in this chapter, we established that they are a physical isolation of the memory and resources needed to maintain themselves. We later mentioned that a process has at least one thread. When Microsoft designed the .NET Framework, it added one more layer of isolation called an application domain or AppDomain. This application domain is not a physical isolation as a process is; it is a further logical isolation within the process. Since more than one application domain can exist within a single process, we receive some major advantages. In general, it is impossible for standard processes to access each other's data without using a proxy. Using a proxy incurs major overheads and coding can be complex. However, with the introduction of the application domain concept, we can now launch several applications within the same process. The same isolation provided by a process is also available with the application domain. Threads can execute across application domains without the overhead associated with inter-process communication. Another benefit of these additional in-process boundaries is that they provide type checking of the data they contain.

Microsoft encapsulated all of the functionality for these application domains into a class called System.AppDomain. Microsoft .NET assemblies have a very tight relationship with these application domains. Any time that an assembly is loaded in an application, it is loaded into an AppDomain. Unless otherwise specified, the assembly is loaded into the calling code's AppDomain. Application domains also have a direct relationship with threads; they can hold one or many threads, just like a process. However, the difference is that an application domain may be created within the process and without a new thread

No comments: