Data Structures using „C‟ Unit 2
Unit 2 Overview of Data Structures
Structure:
2.1 Introduction
2.1.1 What is a Data Structure?
2.1.2 Definition of data structure
2.1.3 The Abstract Level
2.1.4 The Application Level
2.1.5 Implementation Level
Self Assessment Questions
2.2 Data Types and Structured Data Type
2.2.1 Common Structures
2.2.2 Abstract Data Types
2.2.2.1 Properties of Abstract Data Types
2.2.2.2 Generic Abstract Data Types
2.2.2.3 Programming with Abstract Data Types
Self Assessment Questions
2.3 Pre and Post Conditions
2.3.1 Preconditions
2.3.2 Postconditions
2.3.3 Checking Pre & Post Conditions
2.3.4 Implementation Checks Preconditions
Self Assessment Questions
2.4 Linear Data Structure
2.4.1 The Array Data Structure
2.4.2 Using an Array and Lists as a Data Structure
2.4.3 Elementary Data Structures
Self Assessment Questions
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 27
2.5 What the application needs ?
2.6 Implementation methods
2.7 Non Linear Data Structures
2.7.1 Trees
2.7.2 Binary Tree
2.7.3 Hash Tables
Self Assessment Questions
2.8 Summary
2.9 Terminal Questions
2.1 Introduction
Data structures represent places to store data for use by a computer program. As you would imagine, this describes a spectrum of data storage techniques, from the very simple to the very complex. We can look at this progression, from the simple to the complex, in the following way.
At the lowest level, there are data structures supplied and supported by the CPU (or computer chip), itself. These vary from chip to chip, but are almost always of the very primitive sort. They typically include the simple data types, such as integers, characters, floating point numbers, and bit strings. To some extent, the data types supported by a chip reflect the hardware design of the chip. Things such as, how wide (how many bits) are the registers, how wide is the data bus, does the ALU have an accumulator, does the ALU support floating point operations?
At the second level of the data structures spectrum are the data structures supported by particular programming languages. These vary a lot from language to language. Most languages offer arrays, and many offer arrays of arrays (matrices). Most of the popular languages provide support for some sort of record structure. In C these are structs and in Pascal these are
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 28
records. A few offer strings as a first class data type (e.g. C++ and Java). A few languages support linked lists directly in the language (e.g. Lisp and Scheme). Object oriented languages often offer general lists, stacks, and even trees.
At the top level of this taxonomy are those data structures that are created by the programmer, using a particular programming language. In this regard, it is important to note what tools are provided by a language to facilitate the implementation of complex data structures envisioned by a programmer. Things such as arrays, arrays of arrays, pointers, record structures are all helpful in this regard. Using the available tools, a programmer can build general lists, stacks, queues, dequeues, tress (of many types), graphs, sets, and much, much more.
In this book we will focus on those data structures in the top level, those that are usually created by the application programmer. These are the data structures that. generally, impact the problem solution and implementation in the most dramatic ways: size, efficiency, readability , and maintainability .
Objectives
At the end of this unit, you will be able to understand the:
Meaning and brief introduction of Data Structure
Discussed the various types of abstract levels
Brief introduction of Abstract data type and its properties
Operations and implementations of methods of Pre and Post Conditions.
Concepts and methods of Linear and Non Linear Data structure.
2.1.1 What is a Data Structure?
A data structure is the organization of data in a computer's memory or in a file.
The proper choice of a data structure can lead to more efficient programs.
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 29
Some example data structures are: array, stack, queue, linked list, binary tree, hash table, heap, and graph. Data structures are often used to build databases. Typically, data structures are manipulated using various algorithms.
Based on the concept of Abstract Data Types (ADT), we define a data structure by the following three components.
1) Operations: Specifications of external appearance of a data structure
2) Storage Structures: Organizations of data implemented in lower-level data structures
3) Algorithms: Description on how to manipulate information in the storage structures to obtain the results defined for the operations
Working with and collecting information on any subject, it doesn't take very long before you have more data than you know how to handle. Enter the data structure. In his book Algorithms, Data Structures and Problem Solving with C, Mark Allen Weiss writes "A data structure is a representation of data and the operations allowed on that data." Webopedia states, "the term data structure refers to a scheme for organizing related pieces of information."
2.1.2 Definition of data structure
"a specification, an application and an implementation view of a collection of one or more items of data, and the operations necessary and sufficient to interact with the collection. The specification is the definition of the data structure as an abstract data type. The specification forms the programming interface for the data structure. The application level is a way of modeling real-life data in a specific context. The implementation is a concrete data type expressed in a programming language. There may be intermediate levels of implementation, but ultimately the data structure implementation must be expressed in terms of the source language primitive data types”.
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 30
2.1.3 The Abstract Level
The abstract (or logical) level is the specification of the data structure -the "what" but not the "how." At this level. the user or data structure designer is free to think outside the bounds of anyone programming language. For instance. a linear list type would consist of a collection of list nodes such that they formed a sequence. The operations defined for this list might be insert. delete, sort and retrieve.
2.1.4 The Application Level
At the application or user level, the user is modeling real-life data in a specific context. In our list example. we might specify what kind of items were stored in the list and how long the list is. The context will determine the definitions of the operations. For example, if the list was a list of character data, the operations would have a different meaning than if we were talking about a grocery list.
2.1.5 Implementation Level
The implementation level is where the model becomes compilable, executable code. We need to determine where the data will reside and allocate space in that storage area. We also need to create the sequence of instructions that will cause the operations to perform as specified.
Self Assessment Questions
1. Define data Structure? Explain its three components.
2. Discuss the data structure implementation in terms of the source language primitive data type.
2.2 Data Types and Structured Data Type
The definition for the term data type and structured data type and data type consists of
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 31
a domain(= a set of values)
a set of operations.
Example : Boolean or logical data type provided by most programming languages.
two values : true, false.
Many operations including: AND , OR, NOT etc.
Structural and Behavioral Definitions
There are two different approaches to specifying a domain : we can give a structural definition or can give a behavioral definition. Let us see what these two are like.
Behavioral Definition of the domain for ‘Fraction’
The alternative approach to defining the set of values for fractions does not impose any internal structure on them. Instead it Just adds an operation that creates fractions out of other things. such as CREATE_FRACTION(N.D) where N is any integer. D is any non-zero integer.
The values of type fraction are defined to be the values that are produced by this function for any valid combination of inputs. The parameter names were chosen to suggest its intended behavior: CREATE_FRACTION(N.D) should return a value representing the fraction N/D (N for numerator. D for denominator).
You are probably thinking. this is crazy. CREATE_FRACTION could be any old random function. how do we guarantee that CREATE_FRACTION(N,D) actually returns the fraction N/D? The answer is that we have to constrain the behavior of this function. by relating it to the other operations on fractions. For example, One of the key properties of multiplication is that: NORMALIZE ((N/D) .(DIN)) = 1/1
This turns into a constraint on CREATE_FRACTION:
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 32
NORMALIZE (CREATE_FRACfION(N,D) * CREATE_FRACfION(D,)) = CREATE_FRACTION(1,1)
So you see CREATE_FRACTION cannot be any old function, its behavior is highly constrained, because we can write down lots and lots of constraints like this. And that's the reason we call this sort of definition behavioral, because the definition is strictly in terms of a set of operations and constraints or axioms relating the behavior of the operations to one another.
In this style of definition, the domain of a data type -the set of permissible values -plays an almost negligible role. Any set of values will do, as long as we have an appropriate set of operations to go along with it.
2.2.1 Common Structures
Let us stick with structural definitions for the moment. and briefly survey the main kinds of data types, from a structural point of view.
Atomic Data Types
First of all, there are atomic data types. These are data types that are defined without imposing any structure on their values. Boolean, our first example, is an atomic type. So are characters, as these are typically defined by enumerating all the possible values that exist on a given computer.
Structured Data Types
The opposite of atomic is structured. A structured data type has a definition that imposes structure upon its values. As we saw above, fractions normally are a structured data type. In many structured data types, there is an internal structural relationship, or organization, that holds between the components. For example, if we think of an array as a structured type, with each position in the array being a component, then there is a structural relationship of 'followed by': we say that component N is followed by component N+ 1.
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 33
Structural Relationships
Not all structured data types have this sort of internal structural
relationship. Fractions are structured, but there is no internal relationship
between the sign, numerator, and denominator. But many structured
data types do have an internal structural relationship, and these can be
classified according to the properties of this relationship.
Linear Structure:
The most common organization for components is a linear structure. A
structure is linear if it has these 2 properties:
Property P1 Each element is 'followed by' at most one other element.
Property P2 No two elements are 'followed by' the same element.
„An array is an example of a linearly structured data type‟. We generally
write a linearly structured data type like this:
A->B->C->D (this is one value with 4 parts).
- counter example 1 (violates Pl): A points to B and C B<-A->C
- counter example 2 (violates P2): A and B both point to C A->C<-B
2.2.2 Abstract Data Types
Handling Problems
The first thing with which one is confronted when
writing programs is the problem. Typically you are
confronted with "real-life" problems and you want
to make life easier by providing a program for the
problem. However. real-life problems are
nebulous and the first thing you have to do is to
try to understand the problem to separate
necessary from unnecessary details: You try to
obtain your own abstract view, or model. of the
problem. This process of modeling is
Figure: Create a
model
from a problem .with
abstraction.
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 34
called ’abstraction’ and is illustrated in Figure. The model defines an abstract view to the problem.
This implies that the model focuses only on problem related stuff and that you try to define properties of the problem. These properties include:
the data which are -affected and
the operations which are identified by the problem
It is said that "computer science is the science of abstraction." But what exactly is abstraction? Abstraction is "the idea of a quality thought of apart from any particular object or real thing having that quality. For example. we can think about the size of an object without knowing what that object is. Similarly, we can think about the way a car is driven without knowing Its model or make.
As an example consider the administration of employees in an institution. The head of the administration comes to you and ask you to create a program which allows to administer the employees. Well. this is not very specific. For example, what employee information is needed by the administration? What tasks should be allowed? Employees are real persons who can be characterized with many properties; very few are: name. size. date of birth. shape. social number, room number. hair color, hobbies.
Certainly not all of these properties are necessary to solve the administration problem. Only some of them are problem specific. Consequently you create a model of an employee for the problem. This model only implies properties which are needed to fulfill the requirements of the administration. for instance name, date of birth and social number. These properties are called the data of the (employee) model. Now you have described real persons with help of an abstract employee.
Of course, the pure description is not enough. There must be some operations defined with which the administration is able to handle the
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 35
abstract employees. For example there must be an operation which allows you to create a new employee once a new person enters the institution. Consequently, you have to identify the operations which should be able to be performed on an abstract employee. You also decide to allow access to the employees' data only with associated operations. This allows you to ensure that data elements are always in a proper state. For example you are able to check if a provided date is valid.
Abstraction is used to suppress irrelevant details while at the same time emphasizing relevant ones. The benefit of abstraction is that it makes it easier for the programmer to think about the problem to be solved.
To sum up. abstraction is the structuring of a nebulous problem into well-defined entities by defining their data and operations. Consequently, these entities combine data and operations. They are not decoupled from each other.
Abstract Data Types
A variable in a procedural programming language such as Fortran, Pascal, C, etc. is an abstraction. The abstraction comprises a number of attributes -name. address. value. lifetime. scope. type, and size. Each attribute has an associated value. For example, if we declare an integer variable in C & C++. int x, we say that the name attribute has value "x" and that the type attribute has value “int".
Unfortunately, the terminology can be somewhat confusing: The word "value" has two different meanings-in one instance it denotes one of the attributes and in the other it denotes the quantity assigned to an attribute. For example, after the assignment statement x = 5, the value attribute has the value five.
The name of a variable is the textual label used to refer to that variable in the text of the source program. The address of a variable denotes is
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 36
location in memory. The value attribute is the quantity which that variable represents. The lifetime of a variable is the interval of time during the' execution of the program in which the variable is said to exist. The scope of a variable is the set of statements in the text of the source program in which the variable is said to be visible. The type of a variable denotes the set of values which can be assigned to the value attribute and the set of operations which can be performed on the variable. Finally. the size attribute denotes the amount of storage required to represent the variable.
The process of assigning a value to an attribute is called binding. When a value is assigned to an attribute. that attribute is said to be bound to the value. Depending on the semantics of the programming language, and on the attribute in question. The binding may be done statically by the compiler or dynamically at run-time. For example. in Java the type of a variable is determined at ‘compile time-static binding’. On the other hand, the value of a variable is usually not determined until ‘run-time-dynamic binding’..
Here we are concerned primarily with the type attribute of a variable. The type of a variable specifies two sets:
a set of values; and,
a set of operations.
For example, when we declare a variable, say x, of type int, we know that x can represent an integer in the range (-231, 231-1) and that we can perform operations on x such as addition, subtraction, multiplication, and division.
The type int is an abstract data type in the sense that we can think about the qualities of an int apart from any real thing having that quality. In other words, we don't need to know how ints are represented nor how the. operations are implemented to be able to be. able to use them or reason about them.
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 37
In designing object-oriented programs, one of the primary concerns of the
programmer is to develop an appropriate collection of abstractions for the
application at hand, and then to define suitable abstract data types to
represent those abstractions. In so doing, the programmer must be
conscious of the fact that defining an abstract data type requires the
specification of both a set of values and a set of operations on those values.
Indeed, it has been only since the advent of the so-called object-oriented
programming languages that the we see programming languages which
provide the necessary constructs to properly declare abstract data types.
For example, in Java, the class construct is the means by which both a set
of values and an associated set of operations is declared. Compare this with
the struct construct of C or Pascal's record, which only allow the
specification of a set of values!
2.2.2.1 Properties of Abstract Data Types
The example of the quoted before shows, that with abstraction you create a
well-defined entity which can be properly handled. These entities define the
data structure of a set of items. For example, each administered employee
has a name, date of birth and social number. The data structure can only be
accessed with defined operations. This set of operations is called interface
and abstract data type is exported by the entity. An entity with the properties
just described is called an abstract data type
(ADT).
Figure shows an ADT which consists of an
abstract data structure and operations. Only the
operations are viewable from the outside and
define the interface. Once a new employee is
"created" the data structure is filled with actual
values: You now have an instance of an abstract employee. You can create
Abstract data type
Abstract data Structure
Operations Interface
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 38
as many instances of an abstract employee as needed to describe every real employed person.
Let's try to put the characteristics of an ADT in a more formal way:
Definition An abstract data type (ADT) is characterized by the following properties:
1. It exports a type.
2. It exports a set of operations. This set is called interface.
3. Operations of the interface are the one and only access mechanism to the type's data structure.
4. Axioms and preconditions define the application domain of the type.
With the first property it is possible to create more than one instance of an ADT as exemplified with the employee example.
Example of the fraction data type, how might we actually implement this data type in C?
Implementation 1:
typedef struct { int numerator, denominator; } fraction;
main()
{
fraction f;
f.numerator = 1;
f.denominator = 2;
……………
}
Implementation 2 :
#define numerator 0
#define denominator 1
typedef int fraction[2];
main()
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 39
{
fraction f;
f[numerator] = 1;
f[denominator] = 2;
……………
}
These are just 2 of many different possibilities. Obviously these differences are in some sense extremely trivial -they do not affect the domain of values or meaning of the operations of fractions.
2.2.2.2 Generic Abstract Data Types
ADTs are used to define a new type from which instances can be created. For instance, one of lists of apples, cars or even lists. The semantically the definition of a list is always the same. Only the type of the data elements change according to what type the list should operate on.
This additional information could be specified by a generic parameter which is specified at instance creation time. Thus an instance of a generic ADT is actually an instance of a particular variant the ADT. A list of apples can therefore be declared as follows:
List<Apple> listOfApples;
The angle brackets now enclose the data type for which a variant of the generic ADT List should be created. ListOf Apples offers the same interface as any other list, but operates on of type Apple.
Notation :
As ADTs provide an abstract view to describe properties of sets of entities, their use is independent from a particular programming language. We therefore introduce a notation here. Each ADT description consists of two parts:
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 40
- Data: This part describes the structure of the data used in the ADT in an informal way.
- Operations: This part describes valid operations for this ADT, hence, it describes its interface. We use the special operation constructor to describe the actions which are to be performed once an entity of this ADT is created and destructor to describe the actions which are to be performed once an entity is destroyed. For each operation the provided arguments as well as preconditions and postconditions are given.
As an example the description of the ADT Integer is presented. Let k be an integer expression:
ADT integer is
Data
A sequence of digits optionally prefixed by a plus or minus sign. We refer to this signed whole number as N.
Operations
Constructor
Creates a new integer.
add(k)
Creates a new integer which is the sum of N and k.
Consequently, the postcondition of this operation is sum = N+k. Don't confuse this with assign statements as used in programming languages, It is rather a mathematical equation which yields "true" for each value sum, N and k after add has been performed.
sub(k)
similar to add. this operation creates a new integer of the difference of both integer values. Therefore the postcondition for this operation is sum = N-k.
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 41
Set(k)
Set N to k. The postcondition for this operation is N = k
……
end
The description above is a specification for the ADT Integer. Please notice, that we use words for names of operations such as "add". We could use the more intuitive "+" sign instead, but this may lead to some confusion: You must distinguish the operation "+" from the mathematical use of "+" in the postcondition. The name of the operation is just syntax whereas the semantics is described by the associated pre- and postconditions. However, it is always a good idea to combine both to make reading of ADT specifications easier.
Real programming languages are free to choose an arbitrary implementation for an ADT. For example, they might implement the operation add with the infix operator "+" leading to more intuitive look for addition of integers.
2.2.2.3 Programming with Abstract Data Types
By organizing our program this way -i.e. by using abstract data types – we can change implementations extremely quickly: all we have to do is re-implement three very trivial functions. No matter how large our application is.
In general terms, an abstract data type is a. specification of the values and the operations that has 2 properties:
1. it specifies everything you need to know in order to use the datatype
2. it makes absolutely no reference to the manner in which the datatype will be implemented.
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 42
When we use abstract data types, our programs into two pieces:
The Application: The part that uses the abstract datatype.
The implementation: The part that implements the abstract data type.
These two pieces are completely independent. It should be possible to take the implementation developed for one application and use it for a completely different application with no changes.
If programming in teams, implementers and application-writers can work completely independently once the specification is set.
Specification
Let us now look in detail at how we specify an abstract datatype. We will use 'stack' as an example. The data structure stack is based on the everyday notion of a stack, such as a stack of books, or a stack of plates. The defining property of a stack is that you can only access the top element of the stack, all the other elements are underneath the top one and can't be accessed except by removing all the elements above them one at a time.
The notion of a stack is extremely useful in computer science, it has many applications, and is so widely used that microprocessors often are stack-
Use the ADT
Defines
the ADT
Implements the ADT
Application
Implementation
Specification
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 43
based or at least provide hardware implementations of the basic stack operations.
First, let us see how we can define, or specify, the abstract concept of a stack. The main thing to notice here is how we specify everything needed in order to use stacks without any mention of how stacks will be implemented.
Self Assessment Questions
1. Define Structural and Behavioral definitions.
2. Define abstract data type?
3. Discuss the properties of ADT?
2.3 Pre and Post Conditions
2.3.1 Preconditions
These are properties about the inputs that are assumed by an operation. If they are satisfied by the inputs, the operation is guaranteed to work properly. If the preconditions are not satisfied, the operation's behavior is unspecified: it might work properly (by chance), it might return an incorrect answer, it might crash.
2.3.2 Postconditions
Specify the effects of an operation. These are the only things you may assume have been done by the operation. They are only guaranteed to hold if the preconditions are satisfied.
Note: The definition of the values of type 'stack' make no mention of an upper bound on the size of a stack. Therefore, the implementation must support stacks of any size. In practice, there is always an upper bound -the amount of computer storage available. This limit is not explicitly mentioned, but is understood -it is an implicit precondition on all operations that there is storage available, as needed. Sometimes this is made explicit, in which
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 44
case it is advisable to add an operation that tests if there is sufficient storage available for a given operation.
Operations
The operations specified before are core operations -any other operation on stacks can be defined in terms of these ones. These are the operations that we must implement in order to implement 'stacks', everything else in our program can be independent of the implementation details.
lt is useful to divide operations into four kinds of functions:
1. Those that create stacks out of non-stacks, e.g. CREATE_STACK, READ_STACK, CONVERT_ARRAY _TO_STACK
2. Those that 'destroy' stacks (opposite of create) e.g. DESTROY_STACK
3. Those that 'inspect' or 'observe' a stack, e.g. TOP, IS_EMPTY, WRITE_STACK
4. Those that takes stacks (and possibly other things) as input and produce other stacks as output, e.g. PUSH, POP
A specification must say what an operation's input and outputs are, and definitely must mention when an input is changed. This falls short of completely committing the implementation to procedures or functions (or whatever other means of creating 'blocks' of code might be available in the programming language). Of course, these details eventually need to be decided in order for code to actually be written. But these details do not need to be decided until code-generation time; throughout the earlier stages of program design, the exact interface (at code level) can be left unspecified.
2.3.3 Checking Pre Conditions
It is very important to state in the specification whether each precondition will be checked by the user or by the implementer. For example, the precondition for POP may be checked either by the procedure(s) that call
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 45
POP or within the procedure that implements POP? Either way is possible. Here are the pros and cons of the 2 possibilities:
User Guarantees Preconditions
The main advantage, if the user checks preconditions -and therefore guarantees that they will be satisfied when the core operations are invoked -is efficiency. For example, consider the following:
PUSH(S, 1);
POP(S);
It is obvious that there is no need to check if S is empty -this precondition of POP is guaranteed to be satisfied because it is a postcondition of PUSH.
2.3.4 Implementation Checks Preconditions
There are several advantages to having the implementation check its own preconditions:
1. It sometimes has access to information not available to the user (e.g. implementation details about space requirements), although this is often a sign of a poorly constructed specification.
2. Programs won't bomb mysteriously -errors will be detected (and reported?) at the earliest possible moment. This is not true when the user checks preconditions, because the user is human and occasionally might forget to check, or might think that checking was unnecessary when in fact it was needed.
3. Most important of all, if we ever change the specification, and wish to add, delete, or modify preconditions, we can do this easily, because the precondition occurs in exactly one place in our program.
There are arguments on both sides. The literatures specifies that procedures should signal an error if their preconditions are not satisfied. This means that these procedures must check their own preconditions.
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 46
That's what our model solutions will do too. We will thereby sacrifice some efficiency for a high degree of maintainability and robustness.
An additional possibility is to selectively include or exclude the implementation's condition checking code, e.g. using #ifdef:
#ifdef SAFE
if (! condition) error("condition not satisfied");
#endif
This code will get included only if we supply the DSAFE argument to the compiler (or otherwise define SAFE). Thus, in an application where the user checks carefully for all preconditions, we have the option of omitting all checks by the implementation.
Self Assessment Questions
1. Explain the pre and Post conditions with an suitable example.
2. Discuss the advantages of implementation checks preconditions.
2.4 Linear Data Structure
2.4.1 The Array Data Structure
As an example, most programming languages have an array type as one of the built-in types. We will define an array as a homogeneous, ordered, finite, fixed-length list of elements. To further define these terms in the context of an array:
a) homogeneous -every element is the same
b) ordered -there is a next and previous in the natural order of the structure c) finite -there is a first and last element
d) fixed-length -the list size is constant
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 47
Mapping the array to the three levels of a data structure:
1. At the abstract level
Accessing mechanism is direct, random access
Construction operator
Storage operator
Retrieval operator
2. At the application level
Used to model lists (characters, employees. etc).
3. At the implementation level
Allocate memory through static or dynamic declarations
Accessing functions provided -[ ] and =.
2.4.3 Using an Array and Lists as a Data Structure
An array can be used to implement containers.
Given an index (i.e. subscript), values can be quickly fetched and/or stored in an array. Adding a value to the end of an array is fast (particularly if a variable is used to indicate the end of the array); however, inserting a value into an array can be time consuming because existing elements must be rotated.
Since array elements are typically stored in contiguous memory locations, looping through an array can be done easily and efficiently.
When elements of an array are sorted, then binary searching can be used to find particular values in the array. If the array elements are not sorted, then a linear search must be used. After an array has been defined, its length (i.e. number of elements) cannot be changed.
Arrays: Fast and Slow
The following are some comments on the efficiency of arrays:
a) Changing the length of an array can be slow.
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 48
b) Inserting elements at the end of an array is fast (assuming the index of the end-of array is stored; if you have to search for the end-of-array, then this operation is slow).
c) Inserting elements near the beginning of an array can be slow.
d) Accessing an array element using an index is fast.
e) Searching a non-sorted array for a value can be slow.
f) Searching a sorted array for a value can be fast.
2.4.3 Elementary Data Structures
“Mankind's progress is measured by the number of things we can do without thinking." Elementary data structures such as stacks, queues, lists, and heaps will be the "of-the- shelf' components we build our algorithm from. There are two aspects to any data structure:
1) The abstract operations which it supports.
2) The implementation of these operations.
The fact that we can describe the behavior of our data structures in terms of abstract operations explains why we can use them without thinking, while the fact that we have different implementation of the same abstract operations enables us to optimize performance.
In this book we consider a variety of abstract data types (ADTs), including stacks, queues, deques, ordered lists, sorted lists, hash tables, trees, priority queues. In just about every case, we have the option of implementing the ADT using an array or using some kind of linked data structure.
Because they are the base upon which almost all of the ADTs are built, we call the array and the linked list the foundational data structures. It is important to understand that we do not view the array or the linked list as ADTs, but rather as alternatives for the implementation of ADTs.
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 49
Arrays
Probably the most common way to aggregate data is to use an array. In C
an array is a variable that contains a collection of objects, all of the same
type.
For example, int a[5]; allocates an array of five integers and assigns it to the
variable a.
The elements of an array are accessed using integer-valued indices. In C
the first element of an array always has index zero. Thus, the five elements
of array a are a[0] ,a[1]…..a[4]. All arrays in C have a length, the value of
which is equal to the number of array elements.
How are C arrays represented in the memory of the computer? The
specification of the C language leaves this up to the system implementers.
However, Figure illustrates a typical implementation scenario.
The elements of an array typically occupy consecutive memory locations.
That way given i, it is possible to find the position of a[I] in constant time. On
the basis of Figure. we can now estimate the total storage required to
represent an array. Let S(n) be the total storage (memory) needed to
represent an array of n ints. S(n) is given by
S(n) size of (int[n]) (n+ 1) size of (int.)
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 50
where the function size of (x) is the number of bytes used for the memory representation of an instance of an object of type x.
In C the sizes of the primitive data types are fixed constants. Hence size of (int.) = 0(1)
In practice. an array object may contain additional fields. For example. it is reasonable to expect that there is a field which records the position in memory of the first array element. In any event the overhead associated with a fixed number of fields is 0(1). Therefore, S(n)=O(n).
Multi-Dimensional Arrays
A multi-dimensional array of dimension n (i.e. an n-dimensional array or simply n-D array) is a collection of items which is accessed via n subscript expressions. For example. in a language that supports it. (i, j)th the element of the two-dimensional array x is accessed by writing x[i,j].
The C programming language does not really support multi-dimensional arrays. It does however support arrays of arrays. In C a two-dimensional array x is really an array of one- dimensional arrays:
int x[3][5];
The expression x[i] selects the ith one-dimensional array; the expression x[i][j]selects the j th element from that array.
The built-in multi-dimensional arrays suffer the same indignities that simple one-dimensional arrays do: Array indices in each dimension range from zero to length –1, where length is the array length in the given dimension. There is no array assignment operator. The number of dimensions and the size of each dimension is fixed once the array has been allocated.
Self Assessment Questions
1. Write the advantages of linear data structure.
2. Write points on the efficiency of arrays in contact to data structure.
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 51
2.5 What the application needs ?
Terms describing the data structure from the point of view of the application. which only cares how it behaves and not how it is implemented.
List
Generic term for a collection of objects. May or may not contain duplicates. Application may or may not require that it be kept in a specified order.
Ordered list
A list in which the order matters to the application. Therefore for example. the implementer cannot scramble the order to improve efficiency.
Set
List where the order does not matter to the application (implementer can pick order so as to optimize performance) and in which there are no duplicates.
Multi-set
Like a set but may contain duplicates.
Double-ended queue (dequeue)
An ordered list in which insertion and deletion occur only at the two ends of the list. That is elements cannot be inserted into the middle of the list or deleted from the middle of the list.
Stack
An ordered list in which insertion and deletion both occur only at one end (e.g. at the start).
Queue
An ordered list in which insertion always occurs at one end and deletion always occurs at the other end.
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 52
Ordered Lists and Sorted Lists
The most simple yet one of the most versatile containers is the list. In this section we consider lists as abstract data types. A list is a series of items. In general, we can insert and remove items from a list and we can visit all the items in a list in the order in which they appear.
In this section we consider two kinds of lists-ordered lists and sorted lists. In an ordered list the order of the items is significant. The order of the items in the list corresponds to the order in which they appear in the book. However, since the chapter titles are not sorted alphabetically, we cannot consider the list to be sorted. Since it is possible to change the order of the chapters in book, we must be able to do the same with the items of the list. As a result, we may insert an item into an ordered list at any position.
On the other hand, a sorted list is one in which the order of the items is defined by some collating sequence. For example, the index of this book is a sorted list. The items in the index are sorted alphabetically. When an item is inserted into a sorted list, it must be inserted at the correct position.
Ordered Lists
An ordered list is a list in which the order of the items is significant. However, the items in an ordered lists are not necessarily sorted. Consequently, it is possible to change the order of items and still have a valid ordered list.
A searchable container is a container that supports the following additional operations:
1) insert: used to put objects into the container;
2) withdraw: used to remove objects from the container;
3) find: used to locate objects in the container;
4) isMember: used to test whether a given object instance is in the container.
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 53
Sorted Lists
The next type of searchable container that we consider is a sorted list. A
sorted list is like an ordered list: It is a searchable container that holds a
sequence of objects. However, the position of an item in a sorted list is not
arbitrary .The items in the sequence appear in order, say, from the smallest
to the largest. Of course, for such an ordering to exist, the relation used to
sort the items must be a total order.
Lists-Array Based Implementation :
Deleting and inserting an item requires moving up and pushing down the
existing items (O(n) in the worst case)
Linked Lists
Makes use of pointers, and it is dynamic. Made up of series of objects
called the nodes. Each node contains a pointer to the next node. This is
remove process (insertion works in the opposite way).
Comparison of List Implementations
Array-Based Lists: [Average and worst cases]
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 54
Insertion and deletion are O(n).
Direct access is O(1)
Array must be allocated in advance
No overhead if all array positions are full
Linked Lists:
Insertion and deletion O(1)
Direct access is O(n)
Finding predecessor is O(n)
Space grows with number of elements
Every element requires overhead.
Linked Lists
Elements of array connected by contiguity
Reside in contiguous memory
Static (compile time) allocation (typically)
Elements of linked list connected by pointers
Reside anywhere in memory
Dynamic (run time) allocation
2.6 Implementation methods
There are a variety of options for the person implementing a list (or set or stack or whatever).
a) array
We all know what arrays are. Arrays are included here because a list can be implemented using a I D array. If the maximum length of the list is not known in advance. code must be provided to detect array overflow and expand the array. Expanding requires allocating anew, longer array, copying the contents of the old array, and deallocating the old array.
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 55
Arrays are commonly used when two conditions hold. First the maximum length of the list can be accurately estimated in advance (so array expansion is rarely needed). Second, insertion and deletion occur only at the ends of the list. (Insertion and deletion in the middle of an array-based list is slow.)
b) linked list
A list implemented by a set of nodes, each of which points to the next. An object of class (or struct) "node" contains a field pointing to the next node, as well as any number of fields of data. Optionally, there may be a second "list" class (or struct) used as a header for the list. One field of the list class is a pointer to the first node in the list. Other fields may also be included in the "list" object, such as a pointer to the last node in the list, the length of the list, etc.
Linked lists are commonly used when the length of the list is not known in advance and/or when it is frequently necessary to insert and/or delete in the middle of the list.
c) doubly-linked vs. singly-linked lists
In a doubly-linked list, each node points to the next node and also to the previous node. In a singly-linked list, each node points to the next node but not back to the previous node.
d) circular list
A linked list in which the last node points to the first node. If the list is doubly-linked, the first node must also point back to the last node.
2.7 Non Linear Data Structures
2.7.1 Trees
we consider one of the most Important non-linear Information structures- trees. A tree Is often used to represent a hierarchy. This is because the
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 56
relationships between the Items In the hierarchy suggest the branches of a
botanical tree.
For example, a tree-like organization charts often used to represent the lines
of responsibility in a business as shown in Figure. The president of the
company is shown at the top of the tree and the vice-presidents are
indicated below her. Under the vice-presidents we find the managers and
below the managers the rest of the clerks. Each clerk reports to a manager.
Each manager reports to a vice-president, and each vice-president reports
to the president.
It just takes a little imagination to see the tree in Figure. Of course. The tree
is upside-down. However, this is the usual way the data structure is drawn.
The president is called the root of the tree and the clerks are the leaves.
A tree is extremely useful for certain kinds of computations. For example.
Suppose we wish to determine the total salaries paid to employees by
division or by department. The total of the salaries in division A can be found
by computing the sum of the salaries paid in departments Al and A2 plus the
salary of the vice-president of division A. Similarly. The total of the salaries
paid in department Al is the sum of the salaries of the manager of
department Al and of the two clerks below her.
Clearly, in order to compute all the totals. It is necessary to consider the
salary of every employee. Therefore, an implementation of this computation
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 57
must visit all the employees in the tree. An algorithm that systematically visits all the items in a tree is called a tree traversal.
In the same chapter we consider several different kinds of trees as well as several different tree traversal algorithms. In addition. We show how trees can be used to represent arithmetic expressions and how we can evaluate an arithmetic expression by doing a tree traversal. The following is a mathematical definition of a tree:
Definition (Tree) A tree T is a finite. Non-empty set of nodes ,
T = {r} U TI, U T2 U …U Tn with the following properties:
3. A designated node of the set, r, is called the root of the tree: and
4. The remaining nodes are partitioned into n≥ O subsets T, T. …Tn each of which is a tree for convenience, we shall use the notation T= {r. T, T, …T} denote the tree T.
Notice that Definition is recursive-a tree is defined in terms of itself! Fortunately, we do not have a problem with infinite recursion because every tree has a finite number of nodes and because in the base case a tree has n=0 subtrees.
It follows from Definition that the minimal tree is a tree comprised of a single root node. For example Ta = {A}.
Finally. The following Tb = {B, {C}} is also a tree
Ta = {D, {E. {F}}, {G.{H,II}}, {J, {K}. {L}}, {M}}}
How do Ta Tb. & Tc resemble their arboreal namesake? The similarity becomes apparent when we consider the graphical representation of these trees shown in Figure. To draw such a pictorial representation of a tree, T = {r. T1 ,T2, …Tn, beside each other below the root. Finally, lines are drawn from rto the roots of each of the subtrees. T1T2…….Tn
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 58
Figure : Examples of trees.
Of course, trees drawn in this fashion are upside down. Nevertheless, this is
the conventional way in which tree data structures are drawn. In fact, it is
understood that when we speak of “up” and “down,” we do so with respect
to this pictorial representation. For example, when we move from a root to a
subtree, we will say that we are moving down the tree.
The inverted pictorial representation of trees is probably due to the way that
genealogical lineal charts are drawn. A lineal chart is a family tree that
shows the descendants of some person. And it is from genealogy that much
of the terminology associated with tree data structures is taken.
Figure shows one representation of the tree Tc defined in Equation. In this
case, the tree is represented as a set of nested regions in the plane. In fact,
what we have is a Venn diagram which corresponds to the view that a tree
is a set of sets.
Figure: An alternate graphical representation for trees.
2.7.2 Binary Tree
Used to implement lists whose elements have a natural order (e.g. numbers)
and either (a) the application would like the list kept in this order or (b) the
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 59
order of elements is irrelevant to the application (e.g. this list is implementing a set).
Each element in a binary tree is stored in a "node" class (or struct). Each node contains pointers to a left child node and a right child node. In some implementations, it may also contain a pointer to the parent node. A tree may also have an object of a second "tree" class (or struct) which as a header for the tree. The "tree" object contains a pointer to the root of the tree (the node with no parent) and whatever other information the programmer wants to squirrel away in it (e.g. number of nodes currently in the tree).
In a binary tree, elements are kept sorted in left to right order across the tree. That is if N is a node, then the value stored in N must be larger than the value stored in left-child(N) and less than the value stored in right-child(N). Variant trees may have the opposite order (smaller values to the right rather than to the left) or may allow two different nodes to contain equal values.
2.7.3 Hash Tables
A very common paradigm in data processing involves storing information in a table and then later retrieving the information stored there. For example, consider a database of driver's license records. The database contains one record for each driver's license issued. Given a driver's license number. We can look up the information associated with that number. Similar operations are done by the C compiler. The compiler uses a symbol table to keep track of the user-defined symbols in a Java program. As it compiles a program, the compiler inserts an entry in the symbol table every time a new symbol is declared. In addition, every time a symbol is used, the compiler looks up the attributes associated with that symbol to see that it is being used correctly.
Typically the database comprises a collection of key-and-value pairs. Information is retrieved from the database by searching for a given key. In
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 60
the case of the driver'~ license database, the key is the driver's license
number and in the case of the symbol table, the key is the name of the
symbol.
In general, an application may perform a large number of insertion and/ or
look-up operations. Occasionally it is also necessary to remove items from
the database. Because a large number of operations will be done we want
to do them as quickly as possible.
Hash tables are a very practical way to maintain a dictionary. As with bucket
sort, it assumes we know that the distribution of keys is fairly well-behaved.
Once you have its index. A hash function is a mathematical function which
maps keys to integers.
In bucket sort, our hash function mapped the key to a bucket based on the
first letters of the key. "Collisions" were the set of keys mapped to the same
bucket. If the keys were uniformly distributed. then each bucket contains
very few keys!
The resulting short lists were easily sorted, and could just as easily be
searched
We examine data structures which are designed specifically with the
objective of providing efficient insertion and find operations. In order to meet
the design objective certain concessions are made. Specifically, we do not
require that there be any specific ordering of the items in the container. In
addition, while we still require the ability to remove items from the container,
it is not our primary objective to make removal as efficient as the insertion
and find operations.
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 61
Ideally we would' build a data structure for which both the insertion and find operations are 0(1) in the worst case. However, this kind of performance can only be achieved with complete a priori knowledge. We need to know beforehand specifically which items are to be inserted into the container. Unfortunately, we do not have this information in the general case. So, if we cannot guarantee 0(1) performance in the worst case, then we make it our design objective to achieve 0(1) performance in the average case.
The constant time performance objective immediately leads us to the following conclusion: Our implementation must be based in some way K\h element of an array in constant time, whereas the same operation in a linked list takes O{k) time.
In the previous section, we consider two searchable containers-the ordered list and the sorted list. In the case of an ordered list, the cost of an insertion is 0(1) and the cost of the find operation is O(n). For a sorted list the cost of insertion is O(n) and the cost of the find operation is O(log n) for the array implementation.
Clearly, neither the ordered list nor the sorted list meets our performance objectives. The essential problem is that a search, either linear or binary, is always necessary. In the ordered list, the find operation uses a linear search to locate the item. In the sorted list, a binary search can be used to locate the item because the data is sorted. However, in order to keep the data sorted, insertion becomes O(n).
In order to meet the performance objective of constant time insert and find operations. we need a way to do them without performing a search. That is, given an item x, we need to be able to determine directly from x the array position where it is to be stored.
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 62
Hash Functions
It is the job of the hash function to map keys to integers. A good hash function:
1. Is cheap to evaluate
2. Tends to use all positions from O...M with uniform frequency.
3. Tends to put similar keys in different parts of the tables (Remember the Shifletts!!)
The first step is usually to map the key to a big integer, for example
k=wth
h = 1284 x char (key[I])
1=0
This last number must be reduced to an integer whose size is between 1 and the size of our hash table. One way is by h(k) = k mod M where M is best a large prime not too close to 2i -1, which would just mask off the high bits. This works on the same principle as a roulette wheel!
Self Assessment Questions
1. Define Trees. Discuss its usage in different applications.
2. Write note on:
a) Binary Tree b) Hash Tables
2.8 Summary
This unit covers all overview and concepts of data structure with its applications. Data structures represent places to store data for use by a computer program. As you would imagine, this describes a spectrum of data storage techniques, from the very simple to the very complex. We can look at this progression, from the simple to the complex, At the lowest level, there are data structures supplied and supported by the CPU (or computer chip), itself. These vary from chip to chip, but are almost always of the very
Data Structures using „C‟ Unit 2
Sikkim Manipal University Page No.: 63
primitive sort. They typically include the simple data types, such as integers, characters, floating point numbers, and bit strings. On these contacts discussed the various structured data types, Abstract data types, Linear and non linear data structure.
2.9 Terminal Questions
1. Define Data Structure? Explain the types of structured data type.
2. Explain Abstract data types with its characteristics.
3. Discuss the linear data structure with suitable example.
4. Discuss the various types of data structure applications.
5. Write note on:
a) Elementary Data Structures
b) Ordered list
c) Linked list
d) Queue
e) Slack
f) Binary tree
g) Hash tables
No comments:
Post a Comment