This is part IV of a tutorial on how to build a Linux shell. You can read the previous parts of this tutorial from the following links: , , . part I part II part III : You can download the complete source code for Part IV from . NOTE this GitHub repository Introduction to Part IV In this part, we're going to add to our shell. The is a data structure that is used by and to store as entries in the table. Each entry consists of a key (the variable's ) and an associated value (the variable's ). Keys are usually unique, that is, we can't have two entries that share the same key (i.e., can't have two variables that share the same variable name). symbol tables symbol table compilers interpreters variables name value Typically, a Linux shell populates its symbol table on startup. After populating the symbol table, the compiler or interpreter can easily search for a variable in the table to retrieve that variable's value. We can also perform , enforce (e.g., to make a variable visible only to the function in which it was declared), and to external commands. type checking scoping rules export shell variables To populate the symbol table, the shell reads the list, which is passed to the shell from its parent process (usually the process that logged the user in, or a child process of the login process). The shell adds each variable (and its value) to the symbol table. Afterwards, we can edit, remove, or export shell variables at will, using the proper builtin utilities (which we'll talk about later on in this series). environment variables Why Do We Need a Symbol Table? In short, the symbol table enables us to define shell variables, modify their values, use the values of different shell variables when we perform , and export variables to external commands. The symbol table will also become handy when we talk about and later on in this series. variable expansion positional special shell parameters Whenever you ask the shell to , , or the value of a shell variable, you are effectively asking the shell to access and/or modify its symbol table. All shells have some sort of symbol table implementation, although some shells might have different names for it. echo export unset For example, say you invoked the following command: echo $PATH Which should give you an output similar to this: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin You probably know that the command had nothing to do with the output you see on the screen, except for the fact that printed the path out. It is who actually understood that stands for a shell variable name. echo echo the shell $PATH It is also who replaced the word with the actual path value, which was then passed to . The command simply echoed back the argument it was passed by the shell, which is the executable path you see printed on the screen. the shell $PATH echo echo So in order to be able to define, modify, unset, and export shell variables, we first need to implement our symbol table. Let's see how we can do that next. Implementing the Symbol Table There are different ways to implement a symbol table, the common ones being , , and . There are pros and cons to each method, and we don't have the time, nor the space, to discuss each one in detail. For our purposes, we'll use , which are the easiest to implement, and which fair well in terms of speed of access and memory usage. linked lists hash tables binary search trees linked lists ( If you want to use the shell for other than learning, you should consider changing the symbol table implementation to one that uses hash tables or binary trees. You can find an example of the hash table implementation ). NOTE: anything here Now let's get cracking on that code. In your source directory, create a subdirectory named (invoke from your terminal emulator). Navigate to that directory ( ) and create a file named . Add the following code to the header file you've just created: symtab mkdir symtab cd symtab symtab.h symbol_type_e { SYM_STR , SYM_FUNC, }; *name; symbol_type_e val_type; *val; flags; }; level; }; symtab_count; }; ; ; ; ; ; ; ; ; ; ; ; ; ; ; # SYMTAB_H ifndef # SYMTAB_H define # include "../node.h" # MAX_SYMTAB 256 define /* the type of a symbol table entry's value */ enum /* the symbol table entry structure */ { struct symtab_entry_s char enum char unsigned int * ; struct symtab_entry_s next * ; struct node_s func_body /* the symbol table structure */ { struct symtab_s int * , * ; struct symtab_entry_s first last /* values for the flags field of struct symtab_entry_s */ # FLAG_EXPORT (1 << 0) define /* export entry to forked commands */ /* the symbol table stack structure */ { struct symtab_stack_s int * [ ]; struct symtab_s symtab_list MAX_SYMTAB * , * ; struct symtab_s global_symtab local_symtab struct symtab_s * new_symtab ( level) int struct symtab_s * symtab_stack_push ( ) void struct symtab_s * symtab_stack_pop ( ) void int rem_from_symtab (struct symtab_entry_s *entry, struct symtab_s *symtab) struct symtab_entry_s * add_to_symtab ( *symbol) char struct symtab_entry_s * do_lookup ( *str, struct symtab_s *symtable) char struct symtab_entry_s * get_symtab_entry ( *str) char struct symtab_s * get_local_symtab ( ) void struct symtab_s * get_global_symtab ( ) void struct symtab_stack_s * get_symtab_stack ( ) void void init_symtab ( ) void void dump_local_symtab ( ) void void free_symtab (struct symtab_s *symtab) void symtab_entry_setval (struct symtab_entry_s *entry, *val) char # endif The enumeration defines the types of our symbol table entries. We'll use the type to represent shell variables, and to represent functions (we'll deal with shell functions later on in this series). symbol_type_e SYM_STR SYM_FUNC The structure represents our symbol table entries. The structure contains the following fields: struct symtab_entry_s => the name of the shell variable (or function) represented by this entry. name => for shell variables, for shell functions. val_type SYM_STR SYM_FUNC => string value (for shell variables only). val => indicates the different properties we'll assign to variables and functions, such as the export and readonly flags (we'll deal with these later in this series). flags => pointer to the next symbol table entry (because we're implementing the table as a singly linked list). next => for shell functions, the , or , of the function body (we talked about AST's in of this tutorial). func_body Abstract Syntax Tree AST part I The structure represents a single symbol table. For starters, we'll use one symbol table, in which we'll define all our shell variables. Later on, when we discuss shell functions and begin working with script files, we'll need to define more symbol tables. struct symtab_s The zeroth symbol table will be the , in which we'll define our variables (the ones that are accessible to the shell, and all functions and scripts executed by it). global table global Symbol tables number one and above are the , in which we'll define our variables (the ones that are only accessible to the shell function or script in which they were declared). By cascading symbol tables in this way, we are effectively implementing . local tables local variable scoping Our structure contains the following fields: struct symtab_s => 0 for the global symbol table, 1 and above for local symbol tables. level , => pointers to the first and last entries in the table's linked list, respectively. first last Now to be able to cascade symbol tables as we discussed above, we need to define and implement a symbol table . A is a , or , data structure, in which the last item added (or ) is the first item removed (or ). The structure represents our symbol table stack. The structure contains the following fields: stack stack Last-In-First-Out LIFO pushed popped struct symtab_stack_s => the number of symbol tables currently on the stack. symtab_count => an array of pointers to the stack's symbol tables. The zeroth item points to the global symbol table, and the item points to the last (or local) symbol table. The stack can hold up to entries, which we defined at the beginning of the header file to be 256. symtab_list symtab_count-1 MAX_SYMTAB , => pointers to the global and local symbol tables, respectively (for ease of access). global_symtab local_symtab We'll implement the stack later in this lesson. For now, we'll start by writing the functions we'll need to work with symbol tables. Symbol Table Functions Create the file (in the subdirectory) and start by adding the following code: symtab.c symtab symtab_level; { symtab_stack.symtab_count = ; symtab_level = ; (!global_symtab) { ( , ); (EXIT_FAILURE); } (global_symtab, , (struct symtab_s)); symtab_stack.global_symtab = global_symtab; symtab_stack.local_symtab = global_symtab; symtab_stack.symtab_list[ ] = global_symtab; global_symtab->level = ; } # include <stdlib.h> # include <stdio.h> # include <string.h> # include "../shell.h" # include "../node.h" # include "../parser.h" # include "symtab.h" ; struct symtab_stack_s symtab_stack int void init_symtab ( ) void 1 0 * = ( ( )); struct symtab_s global_symtab malloc sizeof struct symtab_s if fprintf stderr "fatal error: no memory for global symbol table\n" exit memset 0 sizeof 0 0 First, we have two global variables: => pointer to our symbol table stack (we only need one stack per shell). symtab_stack => our current level in the stack (0 if we're working with the global symbol table, non-zero otherwise). symtab_level The function initializes our symbol table stack, then allocates memory for, and initializes, our global symbol table. init_symtab() Next, add the following function: { (!symtab) { ( , ); (EXIT_FAILURE); } (symtab, , (struct symtab_s)); symtab->level = level; symtab; } struct symtab_s * new_symtab ( level) int * = ( ( )); struct symtab_s symtab malloc sizeof struct symtab_s if fprintf stderr "fatal error: no memory for new symbol table\n" exit memset 0 sizeof return We'll call the function whenever we want to create a new symbol table (for example, when we're about to execute a shell function). new_symtab() Next, add the following function: { (symtab == ) { ; } (entry) { (entry->name) { (entry->name); } (entry->val) { (entry->val); } (entry->func_body) { free_node_tree(entry->func_body); } (entry); entry = next; } (symtab); } void free_symtab (struct symtab_s *symtab) if NULL return * = -> ; struct symtab_entry_s entry symtab first while if free if free if * = -> ; struct symtab_entry_s next entry next free free We'll call the function when we're done working with a symbol table, and we want to free the memory used by the symbol table and its entries. free_symtab() Next, we'll define a debugging function: { i = ; indent = symtab->level * ; ( , , indent, , symtab->level); ( , , indent, ); ( , , indent, ); ( , , indent, ); (entry) { ( , , indent, , i++, entry->name, entry->val); entry = entry->next; } ( , , indent, ); } void dump_local_symtab ( ) void * = . ; struct symtab_s symtab symtab_stack local_symtab int 0 int 4 fprintf stderr "%*sSymbol table [Level %d]:\r\n" " " fprintf stderr "%*s===========================\r\n" " " fprintf stderr "%*s No Symbol Val\r\n" " " fprintf stderr "%*s------ -------------------------------- ------------\r\n" " " * = -> ; struct symtab_entry_s entry symtab first while fprintf stderr "%*s[%04d] %-32s '%s'\r\n" " " fprintf stderr "%*s------ -------------------------------- ------------\r\n" " " This function prints the contents of the local symbol table. When our shell starts, the local and global symbol tables will refer to the same table. It is only when the shell is about to run a shell function or script file that we have a local table that is different from the global table. (later on in this lesson, we'll write a builtin utility that will call to help us visualize the contents of our shell's global symbol table). dump_local_symtab() Now let's define some functions to help us work with symbol table entries. In the same file ( ), add the following function: symtab.c { (!symbol || symbol[ ] == ) { ; } ((entry = do_lookup(symbol, st))) { entry; } entry = ( (struct symtab_entry_s)); (!entry) { ( , ); (EXIT_FAILURE); } (entry, , (struct symtab_entry_s)); entry->name = ( (symbol)+ ); (!entry->name) { ( , ); (EXIT_FAILURE); } (entry->name, symbol); (!st->first) { st->first = entry; st->last = entry; } { st->last->next = entry; st->last = entry; } entry; } struct symtab_entry_s * add_to_symtab ( *symbol) char if 0 '\0' return NULL * = . ; struct symtab_s st symtab_stack local_symtab * = ; struct symtab_entry_s entry NULL if return malloc sizeof if fprintf stderr "fatal error: no memory for new symbol table entry\n" exit memset 0 sizeof malloc strlen 1 if fprintf stderr "fatal error: no memory for new symbol table entry\n" exit strcpy if else return This function adds a new entry to the local symbol table. Remember, at the beginning of this lesson, I've said that each entry must have a unique key, which is the name we give to the shell variable or function. To ensure this uniqueness, we first check to see if an entry exists with the given name, by calling (which we'll define in a minute). do_lookup() If an entry with the given name exists, we simply return the existing entry, without adding a new one. Otherwise, we add the entry, set its name, and adjust the symbol table's pointers. Lastly, we return the newly added entry. The next function does the opposite work, i.e. it removes the symbol table entry whose key matches the given name: { res = ; (entry->val) { (entry->val); } (entry->func_body) { free_node_tree(entry->func_body); } (entry->name); (symtab->first == entry) { symtab->first = symtab->first->next; (symtab->last == entry) { symtab->last = ; } res = ; } { struct symtab_entry_s *e = symtab->first; (e && e != entry) { p = e; e = e->next; } (e == entry) { p->next = entry->next; res = ; } } (entry); res; } int rem_from_symtab (struct symtab_entry_s *entry, struct symtab_s *symtab) int 0 if free if free if if NULL 1 else * = ; struct symtab_entry_s p NULL while if 1 free return This function frees the memory used by the entry, and adjusts the linked list pointers to remove the entry from the symbol table. To perform lookup (i.e., search for a variable with the given name), we need to define the following function in the same file: { (!str || !symtable) { ; } (entry) { ( (entry->name, str) == ) { entry; } entry = entry->next; } ; } struct symtab_entry_s * do_lookup ( *str, struct symtab_s *symtable) char if return NULL * = -> ; struct symtab_entry_s entry symtable first while if strcmp 0 return return NULL This function searches the given symbol table, starting with the first entry. If the entry's key matches the variable name we're looking for, the function returns the entry. Otherwise, the function follows the linked list pointers to look at each entry, in turn, until we find an entry whose key matches our desired name. If no match is found, we return . NULL Next, add the following function: { i = symtab_stack.symtab_count ; { (entry) { entry; } } (--i >= ); ; } struct symtab_entry_s * get_symtab_entry ( *str) char int -1 do * = . [ ]; struct symtab_s symtab symtab_stack symtab_list i * = ( , ); struct symtab_entry_s entry do_lookup str symtab if return while 0 return NULL This function searches for a symbol table entry whose key matches the given name. This might seem redundant at first, as we've already defined the function to search the local symbol table. The difference here is that searches the whole stack, starting with the local symbol table. At the moment, this distinction is not important, as our local and global symbol tables refer to the one and same table. do_lookup() get_symtab_entry() It is only when we talk about shell functions and script files that you'll appreciate what this function does (so hang tight in there!). Lastly, add the following function: { (entry->val) { (entry->val); } (!val) { entry->val = ; } { *val2 = ( (val)+ ); (val2) { (val2, val); } { ( , ); } entry->val = val2; } } void symtab_entry_setval (struct symtab_entry_s *entry, *val) char if free if NULL else char malloc strlen 1 if strcpy else fprintf stderr "error: no memory for symbol table entry's value\n" This function frees the memory used to store the old entry's value (if one exists). It then creates a copy of the new value and stores it in the symbol table entry. That's it for the symbol table functions. Now let's write some functions to help us work with the symbol table stack. Symbol Table Stack Functions Add the following code to the same source file, : symtab.c { symtab_stack.symtab_list[symtab_stack.symtab_count++] = symtab; symtab_stack.local_symtab = symtab; } { symtab_stack_add(st); st; } { (symtab_stack.symtab_count == ) { ; } symtab_stack.symtab_list[--symtab_stack.symtab_count] = ; symtab_level--; (symtab_stack.symtab_count == ) { symtab_stack.local_symtab = ; symtab_stack.global_symtab = ; } { symtab_stack.local_symtab = symtab_stack.symtab_list[symtab_stack.symtab_count ]; } st; } { symtab_stack.local_symtab; } { symtab_stack.global_symtab; } { &symtab_stack; } void symtab_stack_add (struct symtab_s *symtab) struct symtab_s * symtab_stack_push ( ) void * = (++ ); struct symtab_s st new_symtab symtab_level return struct symtab_s * symtab_stack_pop ( ) void if 0 return NULL * = . [ . -1]; struct symtab_s st symtab_stack symtab_list symtab_stack symtab_count NULL if 0 NULL NULL else -1 return struct symtab_s * get_local_symtab ( ) void return struct symtab_s * get_global_symtab ( ) void return struct symtab_stack_s * get_symtab_stack ( ) void return Here's a quick breakdown of the above functions: adds the given symbol table to the stack, and assigns the newly added table as the local symbol table. symtab_stack_add() creates a new, empty symbol table and pushes it on top of the stack. symtab_stack_push() removes (or pops) the symbol table on top of the stack, adjusting the stack pointers as needed. symtab_stack_pop() and return pointers to the local and global symbol tables, respectively. get_local_symtab() get_global_symtab() returns a pointer to the symbol table stack. get_symtab_stack() Initializing Our Symbol Table Stack Remember at the beginning of this tutorial I told you that the shell needs to initialize its global symbol table and add variables from the environment list to the table? Well, in this part, we'll initialize the symbol table stack and the global symbol table. In the next part of this series, we'll read the environment variables list and add them to our symbol table. Go ahead and create a source file named in your source directory, and add the following code: initsh.c **environ; { init_symtab(); **p2 = environ; (*p2) { *eq = (*p2, ); (eq) { len = eq-(*p2); name[len+ ]; (name, *p2, len); name[len] = ; entry = add_to_symtab(name); (entry) { symtab_entry_setval(entry, eq+ ); entry->flags |= FLAG_EXPORT; } } { entry = add_to_symtab(*p2); } p2++; } entry = add_to_symtab( ); symtab_entry_setval(entry, ); entry = add_to_symtab( ); symtab_entry_setval(entry, ); } # include <string.h> # include "shell.h" # include "symtab/symtab.h" extern char void initsh ( ) void * ; struct symtab_entry_s entry char while char strchr '=' if int char 1 strncpy '\0' if 1 else "PS1" "$ " "PS2" "> " This function initializes the symbol table stack (including the global symbol table) and scans the environment list, adding each environment variable (and its value) to the table. Lastly, the function adds two variables that we'll use to store our prompt strings, and (we talked about prompt strings in ). Don't forget to add the function prototype to your header file: PS1 PS2 part I shell.h ; void initsh ( ) void Next, we need to call this function from our function, before we enter the REPL loop. To do this, add the following line to , right before the loop body: main() main() initsh(); The last thing we need to do is to update our prompt string printing functions, so that they'll use the and variables we've just added to the global symbol table in . We'll also write our first builtin utility, . PS1 PS2 initsh() dump Let's apply these changes to our code next. Updating the Prompt Printing Functions Open your file. Remove the line in the body of the and replace it by the following code: prompt.c print_prompt1() { (entry && entry->val) { ( , , entry->val); } { ( , ); } } void print_prompt1 ( ) void * = (" "); struct symtab_entry_s entry get_symtab_entry PS1 if fprintf stderr "%s" else fprintf stderr "$ " The new code checks if there is a symbol table entry with the name . If there is, we use that entry's value to print the first prompt string. Otherwise, we use our default builtin value, which is . PS1 $ The code changes to the function are similar, so I won't show it here, but you can check it at the GitHub repo link I provided at the top of this page. print_prompt2() Don't forget to add the following include directive at the top of the file, right after the line: #include "shell.h" # include "symtab/symtab.h" Defining our First Builtin Utility A (a.k.a. builtin or internal command) is a command whose code is compiled as part of the shell executable itself, i.e. the shell doesn't need to execute an , nor does it need to a new process in order to execute the command. builtin utility external program fork Many of the commands we use on daily basis, such as , , , and are in fact builtin utilities. You can read more about shell builtin utilities in this POSIX standard . cd echo export readonly link In the course of this tutorial, we'll be adding different builtin utilities to expand our shell. We'll start by defining a structure that will help us store information about our different builtin utilities. Open your header file and add the following code at the end, right before the directive: shell.h #endif ; *name; (*func)( argc, **argv); }; builtins_count; /* shell builtin utilities */ int dump ( argc, **argv) int char /* struct for builtin utilities */ { struct builtin_s char /* utility name */ int int char /* function to call to execute the utility */ /* the list of builtin utilities */ extern []; struct builtin_s builtins /* and their count */ extern int The function prototype declares our first builtin utility, . The structure defines our builtin utilities, and has the following fields: dump struct builtin_s => the builtin utility name, which we'll use to invoke the utility. name => function pointer to the function that implements the builtin utility in our shell. func We'll use the array to store information about our builtin utilities. The array contains number of elements. builtins[] builtins_count Now create a new subdirectory and name it . This is where we'll define all of our builtin utilities. Next, create the file and add the following code to it: builtins builtins.c { , dump }, }; builtins_count = (builtins)/ (struct builtin_s); # include "../shell.h" [] = { struct builtin_s builtins "dump" int sizeof sizeof Next, we'll add a builtin utility, named , which we'll use to dump or print the contents of the symbol table, so we know what's going on behind the scenes (I mainly wrote this utility so that our discussion doesn't sound too abstract and theoretical). dump In the subdirectory, create the file and add the following code to it: builtins dump.c { dump_local_symtab(); ; } # include "../shell.h" # include "../symtab/symtab.h" int dump ( argc, **argv) int char return 0 Simple, right? This function implements our builtin utility, which prints the contents of the local symbol table. dump Updating the Executor Next, we need to tell our executor about our new builtin utility, so that when we enter the command , the shell executes our function, instead of searching for an external command with the name . dump dump() dump To do this, we'll need to fix our function. do_simple_command() In the source directory, open the source file and navigate to the function's definition. Now find the two lines: executor.c do_simple_command() argv[argc] = ; child_pid = ; NULL pid_t 0 Insert a new line between those two lines and enter the following code: i = ; ( ; i < builtins_count; i++) { ( (argv[ ], builtins[i].name) == ) { builtins[i].func(argc, argv); free_argv(argc, argv); ; } } int 0 for if strcmp 0 0 return 1 Now our shell will check to see if the name of the command it’s about to execute is the name of a builtin utility and, if so, it calls the function that implements the builtin utility and returns. And that's it! Now let's compile and test our shell. Compiling the Shell Let’s compile our shell. Open your favorite terminal emulator, navigate to your source directory and make sure you have 14 files and 2 subdirectories in there: Now compile the shell using the following command: gcc -o shell executor.c initsh.c main.c node.c parser.c prompt.c scanner.c source.c builtins/builtins.c builtins/dump.c symtab/symtab.c If everything goes well, should not output anything, and there should be an executable file named in the current directory: gcc shell (If you downloaded the source files from the , you can simply run from the source directory, and it will take care of compiling the shell). GitHub repo make Now invoke the shell by running , and try our new builtin utility, : ./shell dump As you can see from the output, our global symbol table (level 0) contains many variables (67 in the above example), which correspond to the shell’s environment variables that were passed to us from our parent process. The last two entries represent our and variables. Try entering some other commands and see how our shell fairs in parsing and executing simple commands. PS1 PS2 What's Next It might not be obvious right now, but what we’ve done in this part is a accomplishment. Our symbol table will enable us to do important things in our shell, such as performing word expansions, exporting variables to external commands, defining shell functions, and so on. huge In the next part, we’ll talk about the word expansion process, and how we can add word expansion capabilities to our shell. Stay tuned! Previously published at https://medium.com/@mohammedisam2000/lets-build-a-linux-shell-part-iv-cefdd8f58138