Pointers in C

Published on 2007-12-26.

This is tutorial on how pointers work in the C programming language. The tutorial is addressed at beginners. Rather than focusing a lot on the more technical aspects of pointers, and using complex examples, the tutorial focuses on a deeper understanding of the subject so that the issue is understood firmly. The tutorial assumes prior knowledge of C, such as assigning values to variables, compiling, printing to the screen and using comments, but it is not necessary.

One of the greatest strength of C is that it allows the programmer to gain access to low-level hardware such as memory locations. While being one of the greatest strength it is also a great responsibility.

It is often said that pointers are one of the greatest hurdles a beginner must overcome in using the C programming language.

Variables

In reality a pointer is nothing more than a simple variable. A variable is a symbolic representation consisting of letters and/or numbers that represent a value of some sort.

Variables are best thought of as boxes stabled upon one another. The boxes can hold values and each box will be labeled with some symbolic representation so that we can recognize the different boxes from each other. Besides from the symbolic representation each box also has a number. Box number 1 is in the bottom, box number 2 is placed on top of box number 1, and box number 3 is placed on top of box number 2 and so forth.

To access the boxes we can use either the symbolic representation or the box number. When we access the different boxes we can insert items and remove items from the boxes.

When we enter data into our computer memory we are accessing "our boxes". Computer memory, just like the boxes, can be represented with a symbolic representation and each memory location has its own address.

In mid-level and high-level programming languages the compiler or interpreter will take care of handling the memory locations for us. We don't have to think much about what "box" resides at what location in memory. If we have to manage this ourselves it becomes extremely time consuming and difficult to program, but that's how it actually was done in the beginning.

The old way

Let's imagine that you need to store some data in the computer memory. Let's imagine that you are using the computer to calculate some numbers and need to save the result for later usage.

To achieve this the old way, yet simplified, you would first have to figure out what parts of memory are free for usage. Then you would have to make sure that the free memory isn't reserved for some other usage by the operating system. Then you would have to reserve the part of memory that you need and then finally you would fill it with the relevant data.

Let's imagine that you want to store the integer "5" in memory. After gaining access to the memory the operating system supplies you with a memory address, and just for the fun of it, let's call that address "Box number 27".

Now let's imagine for a moment that after a while you need to store another number in the memory as well, the number "6". Again you ask the operating system for access to the memory, but you can't get box number 28, because that has been occupied by some other program, instead you get "Box number 57".

So "Box number 27" holds the value "5" and "Box number 57" holds the value "6".

But what if you had 255 different numbers of different sizes? It would become quite difficult to keep track. That's where the blessings of a mid-level or high-level programming language comes in.

The new way

From this point on I will just refer to both mid-level and high-level languages as high-level.

Using a high-level programming language means that you no longer need to keep track of memory. All you have to do is to use symbolic representations and the compiler or interpreter will take care of the rest.

A variable serves as an easy way to access memory. Think of a variable as a box from before. Inside variables we can store data such as numeric values. But with C we can also input the physical memory addresses of other variables.

So "Box number 27" can hold the value "5" which can represents the amount of money on my bank account, but it can also hold the value "57" which is the physical address of "Box number 57".

In reality a variable is the memory. The memory can contain values such as the amount of money on my bank account, or it can contain addresses of other blocks of memory.

Pointers

When a variable contains the address of a memory block location it is called a "pointer" because it is "pointing" to that particular block of memory.

All programming languages contains variables of some kind, but only few contains pointers. That is because pointers gives direct access to physical memory locations anywhere inside the computer.

With pointers it is possible to access any memory location and change the data at that location. Even specific data from the operating system itself - and that is sometimes the reason why a program crashes.

Working with pointers

To use variables in C they must first be declared. A lot of other high-level programming languages such as PHP, for example, allows you to work with variables without using declarations. This is called "weakly typed". C and Java are strongly typed programming languages because every variable must be declared with a data type before usage. A variable cannot start off life without knowing the range of values it can hold, and once it is declared, the data type of the variable cannot change.

To declare a variable in C you make up a symbolic representation (I will not address the issue of what characters may be used or reserved words). Next you have to let the compiler know, what kind of data you are going to enter into that particular variable. The symbolic representation will be created by the compiler and the right amount of space will be reserved in memory to hold that value.

Let's declare a variable called my_var and reserve room in memory to hold an integer value:

#include <stdio.h>

int main()
{
    int my_var;
    return 0;
}

The variable my_var is now ready to receive some data, and we can define the data of the variable by assigning it a specific value. Let's take it a step further and do that:

#include <stdio.h>

int main()
{
     int my_var;
     my_var = 5;

     return 0;
}

On line 5 the variable gets declared with the name my_var, and it is declared to hold an integer value. On line 6 the variable is assigned the value "5".

Where in the computer memory is the number "5" physically located? We don't know, but we can find out using the & ampersand symbol.

Let's print out both the value of the variable my_var and the physical memory location of my_var:

#include <stdio.h>

int main()
{
    int my_var;
    my_var = 5;

    /* Print out the value of my_var. */
    printf ("%d\n", my_var);

    /* Print out the physical memory address of my_var. */
    printf ("%p\n", &my_var);

    return 0;
}

Notice the added & ampersand to the my_var variable name. This means "give me the memory address of my_var".

If you compile the program and run it, it will print "5" to the screen, and in my case, the memory address "0xbfae625c". To compile it, save it as "mytest.c" and compile it with the command (assuming you are using the GNU GCC Compiler): gcc -o mytest mytest.c, next run it with the command: ./mytest

Now, what if we want to add another variable, let's call it mem_var, to contain the physical memory address of my_var?

We can do that by declaring a variable as a pointer, and that is done using the asterisk sign * like this:

int *mem_var;

The variable mem_var now becomes a pointer, but since it hasn't been initialized with any memory address, it just holds some random data. That random data can literally point to anywhere in the computer memory, and for safety reasons necessary to default its value to NULL upon declaration, like this:

int *mem_var = NULL;

Once a variable has been declared as a pointer it is dangerous to mess with it. You cannot keep ordinary values inside the variable, it is only supposed to contain memory addresses.

In other words: A variable in C can normally contain numbers, chars, etc., depending on how you declare them, but once a variable is declared as a pointer, it MUST only contain memory location addresses. If you assign a number to a variable that has been declared as a pointer, the compiler will automatically assume that the number is a valid memory location - no matter what that number is!

Some programming languages like Ada also make use of pointers, but pointers in Ada are default to NULL automatically, thus making it more safe.

Null is a special pointer value used to signify that a pointer intentionally does not point to an address yet. Such a pointer is called a null pointer in C.

Make it a strong habit to ALWAYS declare C pointers as NULL right away.

Keeping things apart

The method I used to use in order to remember how to keep normal variables and pointer variables apart, was to think of the asterisk sign * as a riffle aim, a riffle aim "pointing" at my "target".

The way I remembered that the & ampersand means "the address of the memory" where the data is stored is by thinking of the "A" in "Ampersand", as the "A" for "Address".

Maybe this isn't the best way, but I managed to keep them apart like that in the beginning.

When declaring a pointer the asterisk sign can also be located next to the type declaration like this:

int* mem_var;

But I prefer to keep it next to the variable.

Continue working with pointers

So now we have got a variable, and we have got the address in memory where the data of that variable is located, and we have also got a pointer pointing to NULL.

Let's make the pointer point to the address of the variable my_var. I will now expand our little program a bit:

 1 #include <stdio.h>
 2 
 3 int main()
 4 {
 5     int my_var = 5;
 6     int *mem_var = NULL;
 7 
 8     mem_var = &my_var;
 9 
10     /* Print out the value of my_var. */
11     printf ("%d\n", my_var);
12 
13    /* Print out the physical memory address of my_var. */
14    printf ("%p\n", &my_var);
15 
16    /* Print out the physical memory address of my_var using the pointer mem_var. */
17    printf ("%p\n", mem_var);
18 
19    return 0;
20 }

On line 5 the variable my_var gets declared to hold the value of an integer and it is assigned the value 5 upon declaration. On line 6 the variable mem_var gets declared as a pointer and is assigned NULL. On line 8 the variable "mem_var" gets assigned a new value and that is the physical memory location address where the value "5" is actually stored. In my case the value "5" is located at memory address "0xbfae625c" (most likely different on your computer).

We get the address by using the & ampersand sign in front of the variable like this &my_var, and the pointer mem_var now points to that address location.

A bit of confusing can arise at this point because of the asterisk sign * and the & ampersand sign.

The & ampersand sign is always easy to remember, just think of the "A" in ampersand as the "A" in address. We ONLY use it in front of a variable to get the memory address of that particular variable.

The asterisk sign * on the other hand is a bit more confusing. When we declare a variable to be a pointer to some address, we use the asterisk sign in front of that variable like this: int *mem_var;, but when we assign an actually memory location to the pointer, by the memory address of another variable, we don't use the asterisk sign any longer: mem_var = &my_var. That is because the mem_var variable is already declared to be of type pointer, meaning that it's going to hold an address, NOT a normal value. And since &my_var IS an address, we simply assign the memory address as a value to the pointer variable mem_var.

This makes it a bit more difficult to keep ordinary variables apart from pointer variables.

What some people do is that they always name their pointer variables be prefixing them with something obvious like "point" or "ptr" in front of the variable name they choose.

If we that our program would looks like this:

 1 #include <stdio.h>
 2 
 3 int main()
 4 {
 5     int my_var = 5;
 6     int *ptr_mem_var = NULL;
 7 
 8     ptr_mem_var = &my_var;
 9 
10     /* Print out the value of my_var. */
11     printf ("%d\n", my_var);
12 
13    /* Print out the physical memory address of my_var. */
14    printf ("%p\n", &my_var);
15 
16    /* Print out the physical memory address of my_var using the pointer ptr_mem_var. */
17    printf ("%p\n", ptr_mem_var);
18 
19    return 0;
20 }

Working a bit more with pointers

What if we want to change the value located at our current memory address?

We can do that by changing the value of our variable my_var like this:

my_var = 17;

By changing the value of the variable my_var we are indirectly accessing the memory location that holds that value. To access that memory location in a more "direct" approach, we can use the pointer that is pointing to the physical location like this:

*ptr_mem_var = 17;

When we declare a variable to be a "pointer" to some address, we use the asterisk sign in front of that variable like this: int *ptr_mem_var;, when we assign an actually memory location to the pointer, by the use an address of another variable, we don't use the asterisk sign any longer: ptr_mem_var = &my_var, but when we need to change the value located at the address which pointer is pointing too, we again need the asterisk sign like this: *ptr_mem_var = 17;

Some common mistakes

One of the most common mistakes is to assign a value to a pointer rather then a memory address by omitting the & ampersand sign like this:

/* Wrong code: */
ptr_mem_var = my_var;

/* Right code: */
ptr_mem_var = &my_var;

In wrong code in the above example the pointer ptr_mem_var will not point to the memory address of my_var, rather it will point to whatever memory address that corresponds to the value of my_var. That's because we have omitted the & ampersand sign. Without the ampersand sign we are not talking about addresses anymore, we re talking about values.

Because C is about giving you, the programmer, power you are allowed to do the above wrong code because who knows, maybe that's actually what you intend to do. Maybe you need to access the physical memory location of the value of my_var and you are thus allowed to do it, but the compiler should at least give you a warning saying something like: "mytest.c:11: warning: assignment makes pointer from integer without a cast".

Another common mistake is to forget to initialize a pointer.

When a pointer is first declared and if we forget to use the NULL value as initialization, the pointer gets assigned some random data and that data could actually be pointing towards a real memory location. The risk of accessing an illegal memory location is big, and all kinds of strange things might happen. We might access some specific part of the operating system, or some part of the memory stack, in either case we need to make sure that the pointer actually points to a safe memory location. A safe memory location is a location that we know holds the value of one of our variables, or if you are dealing with accessing physical hardware, you know of some memory location that you need to access, and you know it is safe.

A note on arrays

C treats arrays as if they are pointers to the first element. If you define an array like this:

char my_text[] = "This is a string of text.";

Then *my_text is the same as my_text[0]. If you declare a character pointer and want it to point to the address location of the character array my_text then you don't need the & ampersand sign, as you do when working with integer pointers.

The code below is wrong:

#include <stdio.h>

int main()
{
    char my_text[] = "This is a string of text.";
    char *ptr_text = NULL;
    ptr_text = &my_text;

    return 0;
}

The code is wrong because my_text is a pointer in itself. So if you use the & ampersand in front of it, you are actually asking for the address of the pointer my_text and NOT the address of the array.

The code below is right:

#include <stdio.h>

int main()
{
    char my_text[] = "This is a string of text.";
    char *ptr_text = NULL;

    /* Pointer to pointer assignment */
    ptr_text = my_text;

    return 0;
}

Always think of arrays as actually pointers because that's what they are in C.

Conclusion

The use of pointers is a powerful tool and it is one of the strongest aspects of C, but at the same time it is dangerous.

In my personal opinion the difficulty lies not in understanding the subject as much as it does in remembering how to use pointers. If time passes and you don't program in C often, one tends to forget how it works.

By the use of imaginary analogies you will perhaps be better suited at remembering how to use pointers.