A little note on C++

I just did a little rehearsal, as part of bit-by-bit effort to revive the old plan of (re)implementing DBConan (this time with GUI). I plan to implement it using QT framework, and that also means the programming language will be C++.

Why QT? It looks neat and I need that because I will be very much concerned with interactivity, ease-of-use, graphical presentation of data, some (cool) animations maybe, and all that stuffs. QT seems to be a great choice.

And why C++? I want to take advantage of Oracle OCCI, because this tool (in its first version) will be made specific for Oracle database. I haven't got a chance to check if there is something similar to Oracle Data Dictionary -- which is vital for DBConan -- in other RDBMS..., let alone study them.

Before anything else, here's the code, and below is the screenshot from running the code. I don’t have time and space right now to explain it. In fact, I don't think any explanation is needed. It's just some test-code I used to confirm / disprove my worries and cast my doubts.

Designing and writing program in C++ is one thing I don't feel very confident about myself. Well, the last time I programmed in C++ was in 2006, using wxWidget and Skype API for the early prototype of peerant's client application, which was not a lot of code, and lasted only for 6 months. So it's not enough.

I've always been a defensive programmer; I always try (up-front) to avoid making silly mistakes that can be hard to spot when the codebase gets big. To me, the keyword is "consistency". I remember my lecturer in the university taught me to be consistent in my choice of words, structure, and style in writing papers / technical documentations. I think the same exact principle holds true for software codes. First I need to know the field, understand the rules of the games, know the potential issues, make up some rules for avoiding the potential issues, and follow / apply them in a consistent manner.

Straight to the core of the matter: I worry about object-lifetime management in C++. I want to minimize the likeliness of (me) making mistakes on that area. I need to clearly describe my worries, organize / structurize them, and hopefully come up some simple rules for avoiding the pitfalls..., or at least a kind of lookup-table for scanning such mistakes during code reviews.

In Java / C# I don't remember I ever worried (too much) about this. It doesn't mean memory-related problems can not exist in applications written in those languages. Memory leaks can still occur, especially whenever in some parts of our code we cache objects (in containers like map or list) and forget to remove the cached object whenever we no longer use it, keeping the reference count from reaching 0..., and GC is of no help in that situation.

But, in my understanding, it's relatively easier to detect (and avoid) that kind of problems with those languages. Beside, those languages have been designed in such way that there are less paths to getting into trouble with object-lifecycle management.

"Less ways to get ourselves in trouble"? How come? One of the reasons, I think: in those languages we only deal with one thing, (object) reference. When we pass an object as function argument or return value, we always pass the reference to the object..., and that's the default behavior. Whenever we want to pass a copy of the object instead, we have to make that copy explicitly.

It's quite different in C++; where we have both choices. We can declare the function this way:

(a) void aFunction(Buck buck) //pass-by-value. The arg will be a new instance, whose internal values will be initialized through the copy-constructor.

or this way:

(b) void laFuncion(Buck& buck) //no copying here, it's pass-by-reference.

How does it lead to problems? Hmm..., well, it lies in the assumption that the client programmer might use. Let's suppose we go the (a) way.

If the client programmer does not read the method signature carefully enough, she might think / assume that the buck being used inside the laFuncion is exactly the same buck that she has within her code (outside the laFuncion). This assumption (mis)leads her to think the modification she made to the buck (outside the laFuncion) will be seen from inside the laFuncion. Bla bla bla, error.

Of course it is her mistake for being such a sloppy programmer. But let's not be so quick to judge (and punish). Maybe she's a decent programmer actually. It's just that she's coming from Javaland, or C#land (which is exactly my case). Well, with the intention to help her (or myself), then I thought: "Why don't we simply go the (b) way? That's the Java way she's familiar with. That would eliminate the issue, right?". Could be. But before we get to that, I'd like to review why would we consider the (a) way at the first place?

Apart from presenting the issue above, it is copious, unnecesarrily copious. That buck in the function argument is an instance whose lifetime is limited to the scope of the function; it will be automatically destroyed whenever the function returns. What's the point of creating a new instance which will be destroyed in such a short time anyway? I'm thinking of flies (I just learned that a fly lives only for a day from watching the obamaninja video). Why did She create flies? They're so useless.

To make the point clearer, let's get down to the code.
---------------
class ObjectA
{
private:
Buck buck;
public:
ObjectA(Buck buck) {
this->buck = buck;
}
};

class ObjectB
{
private:
Buck buck;
public:
ObjectB(Buck& buck) {
this->buck = buck;
}
};
---------------

When we create an instance of ObjectA, passing along an instance of Buck, the following things will take place:

(1) An instance of Buck, that corresponds to the argument "buck", will be created. The copy-constructor will be used for the creation.
(2) An instance of Buck, that corresponds to the instance variable "buck" of ObjectA, will be created. The default (no-arg) constructor will be used.
(3) When the assigment occurs, the assignment operator of of the instance variable "buck" will be called. That's how the internal values of the argument "buck" get copied to the instance variable "buck".
(4) When the function returns, the buck that corresponds to the argument "buck" will be destroyed.

On the other hand, when we create an instance of ObjectB, passing along an instance of Buck, the following things will take place:

(1) An instance of Buck, that corresponds to the instance variable "buck" of ObjectB, will be created. The default (no-arg) constructor will be used.
(2) When the assigment occurs, the assignment operator of of the instance variable "buck" will be called. That's how the internal values of the argument "buck" get copied to the instance variable "buck".

See, an (extra) invocation of copy-constructor is involved in the pass-by-value case; and the final effect is the same for both cases: either ObjectA or ObjectB will have its own instance of Buck, whose internal values are the same as that of the instance of Buck passed in by the client programmer. In Java speak: value equality, but not reference equality. So I declare (b) way as the winner; same effect, less steps.

But wait, I was supposed to come up with an answer for "why would we consider the (a) way at the first place?". Hmm, I just mentioned the key idea of the answer: owning. My initial goal was to ensure the object that gets the value (buck) passed-in, will _own_ the buck. In other word that object will be the one responsible for the lifecycle of the buck. Keyword: composition (instead of aggregation).

Well, we just saw the (b) way accomplishes that goal, as well. So I think we can completely discard the (a) way now.

Going a little further: if that object is supposed to be responsible for the lifecycle of the buck, then (ideally) it should be responsible for it from the beginning (right?); that is to say: wouldn't it be better if that object instantiates the buck (instead of being fed with it)? "Factory method" comes to mind. I think I will consider this as one of the rules for my code.

---
class ObjectB
{
private:
Buck* pBuck;
public:
ObjectB() { }
Buck& getBuck() {
if (pBuck == 0) {
pBuck = new Buck();
}
return *pBuck;
}
};
---

On the flip side: in the cases where we want to state that the object will not / is not supposed to be responsible for the lifecycle of the the buck, then we can use this form:

----
class ObjectC
{
private:
Buck& buck;
public:
ObjectC(Buck& buck) : buck(buck) {
}
};
----

The steps involved:
(0) None of the four things above (mentioned in the (a) way).

Now the problem: this code assumes the buck gets passed into it the instance of ObjectC wouldn't get destroyed before the ObjectC itself is destroyed. I'd say, in a more concrete terms, the buck must be allocated with the new operator, like this:

---
ObjectC* otroFuncion() {
...
Buck* pBuck = new Buck();
ObjectC* pobjC = new ObjectC(*pBuck);
return pObjC;
}
---

Instead of...
---
ObjectC* otroFuncion() {
...
Buck buck();
ObjectC* pobjC = new ObjectC(buck);
return pObjC;
}
---

Hmm..., I think that's not a safe assumption. How to enforce it? But wait..., if the object (ObjectC) is "conscious" that it is _not_ responsible for the lifecycle of the the buck (thus has no control over it), then it is its responsibility to do the checking to see if the buck is still alive, before it does something with it. Hmm..., I guess that's a fair rule.

Argh.... :D. I think that's all. I write down this train-of-thoughts here just in order not to repeat doing this next time. If you think there's something inaccurate / misleading / just plain non-sense here, please let me know. Thank you in advance.