Our most tangible goal for this chapter is the creation of a generic, random access, container class. A container class is one that is intended to be used to hold collections of objects of some type, called the element_type of the container. A generic container class is one that is re-usable across a wide variety of element_types without any modifications to the container class code. A random access container is one in which all elements can be accessed efficiently using a bracket operator, as with arrays. (In later chapters, we will make the relationship between requirements and efficiency more precise. For now, it is sufficient to understand that random access means that bracket operators exist and that they run in a roughly constant amount of time that is independent of the position of the element being accessed.)
Other goals for this generic class include index bounds checking (preventing clients from using out-of-bounds index values), safe memory management, and giving the client the ability to re-size the container. We will call this container a vector class.
A good way to start defining a generic class is with a wish list of desired properties. The first item on the list is that it be a generic class, so that we can re-use the code. Now, if code is to be re-useable, it is worth spending some time making it really useful (as in, people really use it). Because we envision the vector template being a replacement for arrays, a good place to start is with properties that we wish arrays had, but don't. Just imagine the ways you can get into trouble using arrays, or the ways that arrays are inconvenient, and correct them!
A vector [array] wish list is shown in this slide. It would be nice if we could set the size of an array at runtime, and maybe change the size later if the first choice is inadequate. It would be nice if we would be told when we access an out-of-bounds array element. An assignment operator that actually works would also be helpful. It would be nice if memory management (using operators new[size] and delete[]) would go away. And, of course, we want vectors to be a generic class. Then we could write code that is more readable, safer, and more convenient. For example, here we declare vectors of various types:
vector < int > intVector (30); // declare vector of int size 30 vector < char > charVector (20); // declare vector of char size 20 vector < widget > widgetVector (20); // declare vector of widget size 20 vector < pair < char > > charPairVector (10); // declare vector of pairs of char size 10
And, later, change the sizes as follows:
intVector.set_size (50); // expand to size 50 charVector.set_size (60); // expand to size 60 widgetVector.set_size (70); // expand to size 70 charPairVector.set_size (80); // expand to size 80
We will design our vector class to behave exactly this way. The itemized wish list essentially becomes a set of requirements for the public interface of the vector class:
A person making up a wish list for a class may not think about the need for
constructors and destructors, but this person would get into immediate
trouble without them. After figuring out why, the person would register strong and
justified objections to the class developer. People rightfully expect that
objects know how to keep their house in order and do it without explicit
commands to do so. Any class in which dynamic memory is involved will need to
give its objects the ability to allocate memory as they are created and
de-allocate it as the objects go out of existence. For a class developer to
provide less is to fail in the prime directive: write correct, robust code.
Therefore -- we require constructors and a destructor for the vector class.
A wish list is a guide toward the two principal elements of class design: (1) the public interface and (2) the implementation plan. The former consists of the public methods and public operators (and public data, in the rare case when public data is appropriate), while the latter consists of protected data, protected methods, and a data management plan. The public interface is the way objects are used by client programs (and programmers), and thus determines the utility of the class. The implementation plan ultimately determines the functionality and efficiency behind the public interface. Both, together, determine the success or failure of the class as a utility.
Our vector wish list implies the following public methods and operators:
It is natural, and expected, that we software designers take the attitude that the person requesting the software may not ask for all he or she needs, and therefore we have to save the requester from this lack of foresight. Before donning a cape and rushing to improve every requester's wish list, however, consider the following two points:
Thus there is a healthy tension between richness and leanness in class design,
driven by two competing notions of efficiency: it is efficient to make re-usable
components more capable, because it enhances the ability of programmers to
create correct code quickly; it is also efficient to create re-usable components
with only the features that are needed, because unused features bloat the code
and slow down the run of the program. We will find various reasons to consider
such an expansion of the vector class in this chapter and following chapters.
Next, we need to decide on a basic implementation plan, including protected data, protected methods, and an approach to implementation of the public interface. Clues to how this might be done may also come from the wish list. For example, if a vector object is to manage memory for the user, it will need a protected place content in which to store vector elements. (Content will need to be a primitive C array.) Because, as in ordinary arrays, the client's vector may have fewer items actually stored than currently allocated memory can hold, the vector object will need to distinguish notions of size (i.e., the number of elements the client is actually using) and capacity (i.e., the number of elements that the client could use with the current allocated memory).
With the public interface and implementation plan, we can now write the class definition.
To clearly distinguish the container classes used in this course from those in the Standard Template Library, we will:
In addition, please note that the actual code in the library usually has extra features not discussed in the lecture notes. Thus it is best to use the notes as a guide into the fsu library. but always go to the actual header files for a complete version of an API. In particular, The distributed version of fsu::Vector<T> has several features not fully explored in this chapter, including: container class protocol, iterator support, and special display methods. These will be discussed later as needed. For now, let's look carefully at the class definition displayed in the slide:
template <typename T> class Vector { public: // scope Vector<T>:: type definitions typedef T ValueType; // constructors - specify size and an initial value Vector (); // vector of size = 0 and capacity = defaultCapacity explicit Vector (size_t sz); // vector of size = capacity = sz ... Vector (size_t sz, const T& t); // ... and all elements = t Vector (const Vector<T>&); // copy constructor virtual ~Vector (); // destructor // member operators Vector<T>& operator = (const Vector<T>&); // assignment operator Vector<T>& operator += (const Vector<T>&); // expand to append argument T& operator [] (size_t); // bracket operator const T& operator [] (size_t) const; // const version // other methods bool SetSize (size_t); // set size as specified, change capacity iff needed bool SetSize (size_t, const T&); // ... and initialize new elements bool SetCapacity (size_t); // force capacity change (up or down) size_t Size () const; // return size size_t Capacity () const; // return capacity // Container class protocol bool Empty () const; // 1 iff empty bool PushBack (const T&); // expand by 1 new element appended at end bool PopBack (); // contract by 1 from end void Clear (); // make size = 0 T& Front (); // return front element (index 0) const T& Front () const; // cont version T& Back (); // return back element (index size - 1) const T& Back () const; // const version // Iterator support typedef VectorIterator<T> Iterator; friend class VectorIterator<T>; Iterator Begin () const; Iterator End () const; Iterator rBegin () const; Iterator rEnd () const; // Generic display methods void Display (std::ostream& os, char ofc = '\0') const; void Dump (std::ostream& os) const; // overload of const T* operator, facilitates use of previously defined array functions // operator const T* () const; // auto conversion of vector to array // removed 11/10/04: new standard does not allow, creates ambiguities protected: // variables size_t size_, // current size of vector, capacity_; // size of content_ array T* content_; // pointer to the primative array elements // method static T* NewArray (size_t); // safe space allocator } ;
Constructors and destructor
There are three constructors declared: (1) a constructor with no parameters, (2)
a constructor taking a size parameter, and (3) a copy constructor. There is an
interesting situation regarding (1), the constructor with no parameters.
All this boils down to the following advice: if your clients need to make arrays (or vectors) of objects of a given class, then the class needs a default (parameterless) constructor, which needs to be explicitly specified in the class if any other constructor is specified. The vector class requires specified constructors, because object creation requires dynamic memory allocation.
A copy constructor is a bit mysterious at first. Simply recognizing one can be difficult, until you know the pattern:
X (const X& );
is the prototype of the copy constructor for class X. No other pattern can be a copy constructor. Note the features of this pattern. Like other constructors, there is no return value type; the name of the method is the same as the class name; the single parameter, passed by reference, is an object of type X, the same class name; and finally, the parameter is passed as a const reference, which means "read only", i.e., the referenced object cannot be modified by the call of the function.
The copy constructor is invoked implicitly whenever client code implies that an object copy is needed. (Explicit copies are usually made by invoking the assignment operator.) The two ways in which implicit copies of objects are required:
In both these instances, an object must be copied, and the copy constructor is invoked. Every class is required to have a copy constructor. One will be created by the compiler if none is specified in the class. The issue of whether the compiler-created default constructor and copy constructor are appropriate for a given class is a subtle one. [See Dietel, Chapters 6, 7.]
Assignment operator
The assignment operator is used
explicitly by client programs to make copies of objects. As with default and
copy constructors, every class must have an assignment operator, and one is
therefore created by the compiler unless one is specified in the class
definition. Typically, the issues of whether the compiler-created assignment
operator is appropriate are the same as for the copy constructor and default
constructor. Both the copy constructor and the assignment operator must make an
object copy. It is often useful to isolate the object copy code with a protected
method, designed to be called by both the copy constructor and the assignment
operator.
Note that the assignment operator takes a parameter with the same specification as the copy constructor. Note also that the assignment operator has as return type a reference to an object. We will discuss the import of this under implementation. The parameter specified for the assignment operator (and any other member operator) is really the second parameter of the operator, the first being the object that is making the call. To help understand this point, look at the following code fragment:
Vector <char> v(2), w; v[0] = 'a'; v[1] = 'b'; w = v;
The first line declares two vectors of char, v has size 2 and w has no specified size. (The default constructor is called for w and the size-specific constructor is called for v.) The next two lines establish values for the two elements of v using the char assignment operator. The last line assigns v to w using the Vector<char> assignment operator. If we then compiled and ran the code
v.Dump(cout); w.Dump(cout);we would see on the screen:
(a,b) (a,b)
In other words, w would be an exact copy of v, including size and content. This is exactly how we would expect assignment to work. Here is what is going on behind the scenes:
The line w = v is really w calling its assignment operator with parameter v. (In fact, a member operator can be called using function syntax. The following two lines of code have identical semantics:
w = v; w.operator = (v);
The first line follows the familiar operator syntax, while the second line follows operator function syntax. Each of these lines amounts to the same thing -- a call by w to its assignment operator member with parameter v.)
To be assigned the value v, w must first prepare itself to receive new data, and then copy data from v to itself. You should come back to this point when we get to the implementation of assignment.
Bracket operator
The bracket operator is another member operator that takes one (explicit)
parameter, an index value, and returns a reference to the vector element at that
value. Because the return value is a reference, rather than a copy,
it can be used on the left of an assignment and the actual vector
element will be re-assigned a value. Here is the bracket operator used in this
way, followed by the operator function syntax for the same call:
w[1] = 'c'; w.operator [] (1) = 'c';
First, w calls its bracket operator with index 1, returning a reference to the index 1 element of w, which is then assigned the value 'c'. Note that this is the char assignment operator here. A line of code such as
w[0] = v[1];
results in calls to three different operators: First, w calls its bracket operator with index 0, returning a reference to the index 0 element of w. Second, v calls its bracket operator with index 1, returning a reference to the index 1 element of v. And third, the char reference on the left calls its assignment operator with parameter the char reference of the right.
Sizing methods
Clients often need to set the size of a vector object explicitly. More rarely,
clients may need the ability to reserve memory of a certain size by setting
vector capacity. They will therefore also need to be able to discover the current
size and capacity of a vector object.
Container class protocol
These methods are part of a small set of standard methods that we will call
collectively the container class protocol. The need for this pattern
will become evident as we invent additional container classes and investigate abstract data
types. For now, it suffices to understand what these methods do.
A few more operators will be added to the container class protocol in subsequent chapters.
Display methods
Display(os, ofc) sends the vector elements to the ostream os, with the char ch
between each pair of elements (or nothing between, if ofc == '\0').
Dump(os) sends a structural picture of the object to os. Dump() is primarily
used in class testing, and not indended for client use. It may be protected or
removed entirely prior to public release of the class.
Protected data
As planned, we have three protected data items. Size and capacity denote the
current size and memory storage capacity of a vector object. Content is a
pointer to the element type, where an array of type T can be stored.
Static method NewArray()
This method meets our stated goal
of isolating memory allocation in one place. We will discuss its implementation
below. The only point to discuss here is the static keyword. Static
methods have several interesting properties:
We will return to these points again, and you can read up on static class
members on pages 441 (for data) and 485 (for methods) of Deitel.
We will overload three operators for class Vector<T>. Often, overloaded operators require friend status, but in this case they do not, hence they are declared outside the class definition.
template < typename T > ostream& operator << (ostream& os, const Vector<T>& a); template < typename T > bool operator == (const Vector<T>&, const Vector<T>&); template < typename T > bool operator != (const Vector<T>&, const Vector<T>&);
Be sure to review the concept and use of
friend status in your C++ references.
The class definition and its operator overload prototypes are contained in the file vector.h. The contents of this file follow a standard pattern. First, there is a file header, containing information about the file and its contents. The first line of the file is the name of the file. You might also expect to find a date and the name of the creator in the next two lines. Following this basic identification is a documentary statement about the file contents. (This is important information provided by the creator for the user.)
The rest of the file, the body, is code intended to be understood by either the preprocessor or compiler. The first non-comment line of the file is usually a preprocessor command designed to protect against multiple definitions of the same class or function. A typical trick is to use the name of the file in some mutated way as a preprocessor identifier, and conditionally enter the actual code. The pattern we would like to follow in this course is:
/* vector.h author name creation date (optional modification dates) (general documentation for the code contained in the file body) */ #ifndef _VECTOR_H // protect against multiple reads #define _VECTOR_H namespace fsu // namespace for the fsu code library { (code goes here) #include <vector.cpp> // include separate implementation file inside namespace } // end namespace #endif
The include statement just before #endif is used only if some or all of the implementation code is placed in a separate file.
Note that the effect of the last include statement is to make it appear that all
of the code (declaration, definition, and implementation) is in the header
file. This is the standard for template code, because template code cannot be
compiled to object code without first substituting a real type for the typename
parameter(s) in the template.
We have made the choice of putting the implementation of the Vector<> code in a separate file. This choice is made only for convenience. When vector.h is included in any other file, the implementation file is automatically also included. We say that the implementation is logically in the same file as the class definition. Thisis the accepted practice for template code: implementations (whether template functions or template classes) are logically part if the header file that contains the function prototypes and/or class definitions.
/* vector.cpp author name creation date (optional modification dates) (general documentation for the code contained in the file body) "slave" file: this file is a logical extension of vector.h */ // protect against multiple reads provided by master file // namespace provided by master file static const size_t defaultCapacity = 10; (implementation code goes here) (end of file)
The .cpp file follows the same pattern as that described above for the
.h file. The file header usually contains information about the
implementation rather than client documentation. The body contains the code
implementing all of the methods and functions defined in the corresponding .h
file. We now proceed to implement the class Vector<T>.
This is a protected static method, which amounts to having a stand-alone function whose scope is limited to the two files vector.h and vector.cpp and which can only be accessed by members of the class. Using the static keyword seems an elegant way to accomplish this.
Before getting into the details of the implementation, it's worth looking at the form of the implementation. This is a pattern that will repeat itself often. The pattern is:
template < typename T > ReturnType ClassName < T > :: MethodName (parameters) { // function body }
which is the same as the pattern for ordinary function templates, with scope resolution added for the method name. Scope resolution is necessary to identify properly the function name as a method in the class.
The NewArray() method is a key to the entire implementation, hence we are discussing its implementation first. The task to be performed is to allocate new memory for the content array. The benefit of isolating this task in a protected method is mainly one of organization: we could repeat essentially the same code in various places scattered around the implementation file. The disadvantages of doing so are, first, we would have a gaggle of code fragments that should all operate the same way, so if one is upgraded we must search for all the others and upgrade them as well. Second, we might even get lazy or careless and implement one or two of these incorrectly, creating memory allocation bugs that would be hard to track down. Having the allocation isolated here will provide a place to upgrade uniformly, a place to change the memory allocator, and a place to bring in exceptions in future software releases. Here is the code:
template <typename T> T* Vector<T>::NewArray(size_t newcapacity) // safe memory allocator { T* Tptr; if (newcapacity > 0) { Tptr = new(std::nothrow) T [newcapacity]; if (Tptr == 0) { std::cerr << "** Vector error: unable to allocate memory for array!\n"; exit (EXIT_FAILURE); } } else { Tptr = 0; } return Tptr; }
The implementation is not complicated. First look at the incoming parameter to see if it is zero; if it is, we need do nothing. Otherwise, we need to request a new allocation of memory, check to see if our request has been granted, return a pointer to this new allocation if it has been, and return zero (a legitimate pointer value) otherwise. This version also writes a warning message to cerr, "the error channel", if memory allocation has failed. Thus, both the user and the client software are made aware of failure. This is a simple implementation that provides safety and can be upgraded easily in the future.
One point to note is the use of the argument std::nothrow for operator
new. This design choice prevents allocation failure from throwing an
exception, thus confining handling the problem to this location.
The constructors provide the first uses of our memory allocator NewArray(), and it is a very good use: they become one-liners!
Before going further, do you remember that constant declared in file vector.cpp, the one called "defaultCapacity"? We didn't mention it at the time, just slipped it in. It is also declared using the keyword static. This is the legacy C use of "static", which means file scope. Note also that defaultCapacity is a constant, so even though it appears to be exposed, it is actually quite a safe device. Here are key properties of static const declarations (outside of classes):
So, we have this constant defaultCapacity. Guess what? It becomes the default capacity of a vector object when no other is specified! The default constructor (the one with no parameters) makes a vector object with capacity_ = defaultCapacity and size_ = 0. These values are established in the initialization list before the body of the method. The method body consists only of a call to NewArray() to establish the content_ array.
template <typename T> Vector<T>::Vector() : size_(0), capacity_(defaultCapacity), content_(0) // Construct a vector of zero size and default capacity { content_ = NewArray(capacity_); }
The constructor with two parameters requires an initial size and an initializing value for the vector elements. In this case we set the initial size and initial capacity to the client-supplied size argument, then initialzie the vector elements one at a time:
template <typename T> Vector<T>::Vector(size_t sz, const T& Tval) : size_(sz), capacity_(sz), content_(0) // Construct a vector of size and capacity sz, each element initialized to Tval { content_ = NewArray(capacity_); for (size_t i = 0; i < size_; ++i) content_[i] = Tval; }
A constructor with one parameter for size has been added to the class in the code library. One addition that you will see in the distributed version of vector.h is that this constructor is preceded by the keyword explicit. This keyword prevents the constructor from being called implicitly, as might otherwise happen when the compiler is trying to make sense out of a statement like "v = n", where v is a vector and n is an integer. This would certainly represent a typing error, but the existence of a 1-parameter constructor would allow the compiler to actually make sense of the statement, setting the size of the vector v to 1. The explicit keyword prevents such implicit type casts.
template <typename T> explicit Vector<T>::Vector(size_t sz) : size_(sz), capacity_(sz), content_(0) // Construct a vector of size and capacity sz, each element initialized to Tval { content_ = NewArray(capacity_); }
The destructor merely de-allocates memory, a simple but vital task:
template <typename T> Vector<T>::~Vector() // destructor { delete [] content_; }
The copy constructor code is as follows:
template <typename T> Vector<T>::Vector(const Vector<T>& source) : size_(source.size_), capacity_(source.capacity_) // copy constructor { content_ = NewArray(capacity_); for (size_t i = 0; i < size_; ++i) { content_[i] = source.content_[i]; } }
This code is repeated for the assignment operator, our next case.
The assignment operator header follows the general pattern for member operators
template < typename T > ReturnType ClassName::operator symbol (parameter list) { // operator function body }
as well as the more specialized established pattern for assignment operators
typename& operator = (const typename&)
which contains two references to the same typename, one the return value and the other the parameter. The only place the pattern allows for change is in typename. This assignment operator pattern has developed in order to facilitate multiple calls to =(), as in
x = y = z;
Operator =() associates from right to left, meaning that parentheses are implied as
(x = (y = z));
This illustrates the utility of returning the typename reference as a value. After the first call (y = z) returns a reference to (the new) y, the second call is (x = y), which is effectively the same as x = z since y has already been made equal to z. The entire result is as we would expect: x and y are now equal to z.
Assignment is similar to copy with one huge caveat: The copy constructor is always building a new object, so it does not need to worry about proper care of an old object. Assignment, on the other hand, must always deal with an existing object that needs careful attention.
The first pitfall is the possibility of self-assignment. A client program may write something like
Vector <int> v, x; // yada dada v = v;
Now this may seem ridiculous, but it can happen. First, never underestimate the ability of a client (program) to do something unexpected, such as writing "v = v;" explicitly. But, second, it could happen in a more subtle way. The "yada dada" part of the code could have been very complex, with the result that the vector x has been significantly transformed, assigned to and from, and other things, and then we could have the statement "v = x;" without knowing whether x is v or not. The statement "v = x;" could very well be the equivalent of "v = v;" without the client's knowledge. We have to guard against that possibility, because self-assignment is often a disaster. Keeping all this advice in mind, here is an implementation for assignment in class Vector:
template <typename T> Vector<T>& Vector<T>::operator = (const Vector<T>& source) // assignment operator { if (this != &source) { // the NULL case if (source.capacity_ == 0) { if (capacity_ > 0) delete [] content_; size_ = capacity_ = 0; content_ = 0; return *this; } // set capacity if (capacity_ != source.capacity_) { if (capacity_ > 0) delete [] content_; capacity_ = source.capacity_; content_ = NewArray(capacity_); } // set size_ size_ = source.size_; // copy content for (size_t i = 0; i < size_; ++i) { content_[i] = source.content_[i]; } } // end if return *this; } // end assignment operator =
To assign, we first make sure we are not self-assigning by the test "(this != &source)". Recall that this is pre-declared as a protected data member that is the address of the object, that is, a pointer to the current object. Also recall that &x is the address of x. So this test is asking whether the calling object *this has the same address as the parameter object source. That really is the appropriate test. We don't care if the objects are equal, we only care if they are the same object. Plainly, if the two objects are the same, we do not need to do anything. Not so plainly, applying the copy code in this case would be a disaster. (Trace through the code, assuming (this == &source), and see what happens.)
Once past this test, assignment is straightforward. The null case requires only
setting content to zero. The main case is more interesting. First we dispose of
the memory allocated to *this, the receiving object, and then allocate
the right amount of new memory (a step that may be skipped in the case where
capacities are the same). Finally, now that the capacities and sizes are the
same, we copy all elements from the source to the target (*this
again). The return value is, in all cases, *this (or rather, a
reference to *this).
The bracket operator header follows the general pattern for member operators:
template < typename T > ReturnType ClassName::operator symbol (parameter list) { // operator function body }
The bracket operator is interesting on two points: first, we can rig a safety mechanism for out-of-range indices; second, it is our first const method. The safety mechanism is straightforward to implement, except you may wonder why there is no check for indices less than zero, that is until you realize that index values are of type size_t, an unsigned integer type.
There are actually two versions of the bracket operator. One version is an ordinary member operator that returns a reference to the element at that index. The other is a const member operator that returns a const reference to the element at that index. The compiler selects the appropriate version for a client program automatically.
As to const: this keyword in a method declaration means that the method is guaranteed not to change the state of the calling object. The const attribute means that the operator can be called by const objects, and it means that if we inadvertently do something "non-const" in the implementation, the compiler will complain. Here are the implementations:
template <typename T> T& Vector<T>::operator [] (size_t i) { if (i >= size_) { cerr << "** Vector<T> Error: vector index out of range!\n"; if (i >= capacity_) exit(EXIT_FAILURE); } return content_[i]; } template <typename T> const T& Vector<T>::operator [] (size_t i) const { if (i >= size_) { cerr << "** Vector<T> Error: vector index out of range!\n"; if (i >= capacity_) exit(EXIT_FAILURE); } return content_[i]; }
The two implementations have identical code.
Three non-member operators are overloaded for Vector:
template <typename T> ostream& operator << (ostream& os, const Vector<T>& v) { v.Display(os); return os; } template <typename T> bool operator == (const Vector<T>& v1, const Vector<T>& v2) { if (v1.Size() != v2.Size()) return 0; for (size_t i = 0; i < v1.Size(); ++i) if (v1[i] != v2[i]) return 0; return 1; } template <typename T> bool operator != (const Vector<T>& v1, const Vector<T>& v2) { return !(v1 == v2); }
These operators are not class members, therefore their headers follow the non-member template pattern
template < typename T > ReturnType FunctionName (parameter list) { // function body }
Even friends, which are declared inside the class, follow this non-member pattern. We have overloaded the output operator using the output operator pattern
ostream& operator << (ostream& , typename&)
containing two references to ostream, one the return value and the other the first parameter. The only place the pattern allows for change is the typename in the second parameter slot. This output operator pattern has developed in order to facilitate multiple calls to <<(), as in
cout << x << y << z;
Operator <<() associates from left to right, meaning that parentheses are implied as
(((cout << x) << y) << z);
This illustrates the utility of returning the ostream reference as a value. After the first call (cout << x) returns a reference to cout, the second call is (cout << y), and so on.
Operators ==() and !=() return an integer value, interpreted as
boolean. The code for operator ==() has some efficiencies, stopping the
test at any point when the answer is known. Ultimately, though, the need to do
about size/2 tests, on the average, cannot be changed.
There are two methods in Vector designed to display vector data. The first, Display(), is a function that gives slightly more flexibility than is possible with the output operator via the second parameter char ofc. The value of this parameter is used as an output format character (hence "ofc") that is placed between the vector elements in the output stream. Common ofc instances are the null character '\0', which places nothing between elements, the blank character ' ', tab '\t', and end-of-line '\n'.
template <typename T> void Vector<T>::Display(ostream& os, char ofc) const { size_t i; if (ofc == '\0') for (i = 0; i < size_; ++i) os << content_[i]; else for (i = 0; i < size_; ++i) os << content_[i] << ofc; } // end Display()
The second display method is Dump(), which is designed to help in testing and debugging the class.
template <typename T> void Vector<T>::Dump(ostream& os) const { size_t i; if (size_ == 0) { os << "()"; } else { os << '(' << content_[0]; for (i = 1; i < size_; ++i) { os << ',' << content_[i]; } os << ')'; } } // end Dump()
Dump() should display as accurate as possible
view of the actual internal structure of the container. Dump() might well be
removed, or privatized, prior to public release of the software. Dump() will be
a standard feature of our container classes in this course.
Setting capacity of a vector is under the direct control of a client program through the method SetCapacity(). This method would be used where exact control of allocated memory for the vector footprint is needed.
// Reserve more (or less) space for vector growth; // this is where memory is allocated. Note that this is // an expensive operation and should be used judiciously. // SetCapacity() is called by SetSize() only when increased capacity // is required. If the client needs to reduce capacity, a call must be // made specifically to SetCapacity. template <typename T> bool Vector<T>::SetCapacity(size_t newcapacity) { if (newcapacity == 0) { delete [] content_; content_ = 0; size_ = capacity_ = 0; return 1; } if (newcapacity != capacity_) { T* newcontent = NewArray(newcapacity); if (newcontent == 0) return 0; if (size_ > newcapacity) size_ = newcapacity; for (size_t i = 0; i < size_; ++i) { newcontent[i] = content_[i]; } capacity_ = newcapacity; delete [] content_; content_ = newcontent; } return 1; } // end SetCapacity()
This public method gives the client control over how much, or how little, memory is allocated for their vector objects. There are special circumstances, such as embedded systems applications, when such control is desirable and even essential. For most cases, though, such control is unnecessary and the client would be well advised not to bother using this method at all. SetCapacity() will be called when necessary by other methods, such as SetSize() and PushBack(), and these will do so with runtime efficiency taken into account. SetCapacity() is a costly method to invoke: both NewArray() and the content copying routine require runtime proportional to the size of the vector.
The implementation is straightforward. First check to see if work can be avoided
(it can, in the cases where newcapacity is zero or the same as old
capacity_); then get a new content array allocated, and copy data from
old to new space. Finally, delete old space.
Setting size is a matter of changing the value of the parameter size_ and checking to see that current capacity is not exceded. A call to SetCapacity() is made if necessary. Note that SetSize() never lowers capacity. A second version of SetSize() initializes all new elements to a specified value:
template <typename T> bool Vector<T>::SetSize(size_t newsize) // (re)set size { if (newsize > capacity_) if (!SetCapacity(newsize)) return 0; size_ = newsize; return 1; } template <typename T> bool Vector<T>::SetSize(size_t newsize, const T& Tval) // (re)set size with extra elements initialized to the same value { size_t i, oldsize = size_; if (!SetSize(newsize)) return 0; for (i = oldsize; i < newsize; ++i) { content_[i] = Tval; } return 1; }
The 2-parameter version of SetSize() goes nicely with the initializing constructor discussed earlier. The Clear() method is implemented by re-setting the size parameter to zero:
template <typename T> void Vector<T>::Clear() { size_ = 0; }
The simplicity of implementation of Clear() is startling. Remember, the
policy is not to reduce capacity unless explicitly commanded by the user. There
is nothing gained by overwriting the excess capacity uncovered by setting size
to zero.
These two methods are typical information suppliers.
template <typename T> size_t Vector<T>::Size() const { return size_; } template <typename T> size_t Vector<T>::Capacity() const { return capacity_; }
They are declared as const because they should not, and do not, change the state of the vector.
Two more informational const methods provide the content at the beginning and end of the vector via references. It is an error to use them on an empty vector.
template <typename T> T& Vector<T>::Front() { if (size_ == 0) { std::cerr << "** Vector error:: invalid Front() called on empty vector\n"; exit (EXIT_FAILURE); } return content_[0]; } template <typename T> T& Vector<T>::Back() { if (size_ == 0) { std::cerr << "** Vector error: invalid Back() called on empty vector\n"; exit (EXIT_FAILURE); } return content_[size_ - 1]; } template <typename T> const T& Vector<T>::Front() const { if (size_ == 0) { std::cerr << "** Vector error:: invalid Front() called on empty vector\n"; exit (EXIT_FAILURE); } return content_[0]; } template <typename T> const T& Vector<T>::Back() const { if (size_ == 0) { std::cerr << "** Vector error: invalid Back() called on empty vector\n"; exit (EXIT_FAILURE); } return content_[size_ - 1]; }
Like the bracket operator, these come in both const and regular flavor.
Like Clear(), this is a surprisingly simple method body. (The body of a function is the implementation, brace to brace.) There is no need to overwrite the element being "popped", and we are not reducing capacity as a matter of policy, unless explicitly commanded to do so by the client using SetCapacity().
template <typename T> bool Vector<T>::PopBack() { if (size_ == 0) return 0; --size_; return 1; }
On one level the task here is simple: increase size by one and insert (a copy of) the parameter value into the newly opened vector element. A problem arises when size cannot be increased without increasing capacity. In this case, size_ == capacity_, the following are issues of concern:
Writing an implementation for PushBack() is a left as an exercise.
Since Vector has already been designed, implemented, and placed into service, it is a little late to be adding requirements! The complexity requirements shown in the slide (repeated in the table below) can be thought of in two ways, either actual properties of Vector that we determine from an analysis of the implementation, or requirements that were in place and adhered to during the implementation. Either way, these complexity statements are in fact all true statements about Vector.
Vector Operation Runtime Complexity Actual Requirement PopBack(), Clear()
Front(), Back()
Empty(), Size(), Capacity()
bracket operator []Θ(1) O(1) PushBack(t) Amortized Θ(1) Amortized O(1) SetSize(n), SetCapacity(n) O(n) O(n) assignment operator = Θ(n), n = Capacity() O(n), n = Capacity() Constructors, Destructor
Θ(n), n = Capacity() O(n), n = Capacity() Display(os,ofc) Θ(n), n = Size() Dump(os) Θ(n), n = Capacity() Note that Θ(1) and O(1) are equivalent
The new ISO standard for C++ does in fact impose such complexity requirements on the language, particularly on the standards for the Standard Template Library. In this course we will follow the same practice: impose both functionality and efficiency requirements on our data structures and algorithms as they are designed and placed into service in the course code library.
Most of the requirements (or findings) listed in the slide are completely straightforward to verify. For example, the code implementing PopBack() consists of a simple decrementation of the size_ datum, obviously a constant runtime algorithm. Similarly, the bracket operator just checks a condition and returns a value, again clearly an O(1) process. There are a few subtleties that warrant extra attention, however.
Assertion: Vector<T>::PushBack(t) has amortized runtime complexity <= O(1).
First the definition of "amortized" in this context: amortized complexity of a function f(n) is the complexity of the average function
Having amortized complexity O(1) is almost as good, but not quite, as having complexity O(1). "Amortized" complexity of an operation is a statement about the average cost of the operation, leaving open the possibility that any given instance of the operation may be quite costly, as long as most of the others are low cost so that the average cost stays low. To evaluate the amortized complexity of Vector<T>::PushBack(t), we must evaluate the runtime for a sequence of operations and then take the average. Here, for reference, is the implementation code:
template <typename T> bool Vector<T>::PushBack(const T& Tval) // grow by doubling capacity { if (size_ >= capacity_) { if (capacity_ == 0) { if (!SetCapacity(1)) return 0; } else if (!SetCapacity(2 * capacity_)) return 0; } content_[size_] = Tval; ++size_; return 1; }
Let c(n) denote the runtime of the portion of this code that must always be executed, that is:
{ if (size_ >= capacity) { } content_[size_] = Tval; ++size_; return 1; }
and let d(n) be the cost of the body of the if statement, that is:
{ if (capacity == 0) { if (!SetCapacity(1)) return 0; } else if (!SetCapacity(2 * capacity)) return 0; }
where n is the size of the vector. Observe that c(n) is actually independent of size, so that c(n) is really a constant c. Also observe that
d(n) | = n, if size_ = capacity |
= 0, otherwise |
If we start with an empty vector and issue n PushBack() calls, then the condition (size_ >= capacity) is met exactly at the powers of 2, because of the doubling of capacity that takes place in the body of the if statement.
d(n) | = n, if n is a power of 2 |
= 0, otherwise |
The total complexity of a given call to PushBack() is then
Now compute the average of a sequence of n calls, assuming for simplicity that n = 2k, starting with an empty vector:
A(n) | = (f(1) + f(2) + ... + f(n))/n |
= ((c + d(1)) + (c + d(2)) + ... + (c + d(n)))/n | |
= c + (d(1) + d(2) + ... + d(n))/n | |
= c + (1 + 2 + 0 + 4 + 0 + 0 + 0 + 8 + 0 + ... + 2k)/2k | |
= c + (1 + 2 + 4 + ... + 2k)/2k | |
= c + (2k + 1 - 1)/2k | |
< c + 2k + 1/2k | |
= c + 2 |
Therefore A(n) <= O(1), which proves the assertion.
This is really about the runtime of the more primitive new T [n] operation. It is surprising that this call is = Θ(n), because we think of it as allocating a block of memory of size n * sizeof(T), which should not be an iterative process. After raw memory is allocated, however, each individual "T footprint" must be prepared. The preparation consists of calling the type T default constructor for each of the n T objects in the memory block. Therefore at least n atomic computations are made, proving that the runtime of any call to new T [n] is bounded below by Ω(n).
The reason behind this asserton is similar to that above. In every case, either a constructor or a descructor must be called for each object in the container.
A summary of all the Vector runtime requirements is shown in the slide.