Monday, September 18, 2017

Brief Appearance at CppCon

This year's CppCon includes two panel discussions devoted to technical training, and I'll be on the one on Monday, September 25. Other members of the panel will be Giuseppe D'Angelo, Stephen Dewhurst, Kate Gregory, and Anthony Williams. The moderator will be Jon Kalb, who's also an experienced trainer. Together, we've probably indoctrinated many thousands of developers in the ways we believe to be right and just in the battle between programmer and machine.

Most people would probably date my work with C++ to the initial publication of Effective C++ in late 1991, but I'd been training professional programmers for several years before that, and since retiring from C++ involvement at the end of 2015, I've given a few more presentations on non-C++ technical topics (most recently a couple of weeks ago). All told, I have close to 30 years' experience training professional software developers, so I'd like to think I know a thing or two about it. To find out if I do, I encourage you to attend the panel session.

Monday will be the only day I'll be at the conference, so if you want to hunt me down to say hello, that'll be the day to do it.

Scott

Wednesday, July 5, 2017

Sales Data for EMC++: Print Books, Digital Books, and Online Access

O'Reilly President Laura Baldwin's recent blog post explaining O'Reilly's decision to discontinue selling individual books and videos through their web site (while continuing to publish books and videos for sale through other channels) inspired me to take a look at the sales data I have for Effective Modern C++. I wrote that book with both print and electronic publication in mind, assuming that by the time it came out, demand for digital formats would be at least as strong as demand for print products.

That has not proven to be the case. I have data for the first 35 months of the book's existence (through May 2017), and since initial publication, sales of digital editions make up only about 41% of the over 50,000 units (i.e., copies of the book) sold. Here's a chart of print sales versus ebook sales by month:
Because it takes more time to print books than to make them available on the Internet, the digital versions were downloadable four months before the print books came out. That's apparent at the left side of the chart. Since then, print sales have beaten ebook sales almost every month. Most of the time, it hasn't been much of a contest.

These data exclude sales of foreign language translations of the book. My royalty statements don't break down sales of translations into print and digital formats.

It's clear that buyers of EMC++ have a pretty strong preference for the paper version. This is consistent with sales data for my other books (Effective C++, Effective STL, More Effective C++), but those books were initially published before digital books took off, and they were never designed for digital consumption. The fact that print sales dominates for them is not a surprise.

O'Reilly is getting out of the retail book and video sales business in order to focus on its online subscription service, Safari. Baldwin states that that side of the business has the most customers and is growing the fastest. I don't doubt her. But what does that mean for me?

Here's the royalty source data for Effective Modern C++, broken down into "Online" sources (which includes Safari) and "Other." Included in "Other" is all sales of complete books, regardless of format. Ebook sales are thus "Other", not "Online".
As you can see, the online component of my royalties (including Safari) is generally under 10% each month. Summed over the course of the book's existence, the online contribution to my total royalties is only 5.7%.  There appears to be a slight upward trend over time, but it's hardly something that sets an author's heart aflutter. From a royalty point of view, sales of complete books is at least ten times as important to me as online access.

What do the data for Effective Modern C++ have to say about the trends in publishing Baldwin describes in her post?  Very little. A key observation in her post is that "digital enabled new learning modalities such as video and interactive content," and my book is an example of neither. She refers to how O'Reilly has long recognized that they aren't really in the book-publishing business, they're in the knowledge-spreading business. Books are one way to spread knowledge, but they aren't the only way, and from the perspective of a publisher, they are a way that's less and less important.

The charts above demonstrate that regardless of the general movement in the information-dissemination business towards digital, non-book-like, subscription-based models, complete books--especially print books--are, at least in the case of my readership, very much alive and kicking.

Friday, June 30, 2017

O'Reilly's Decision and its DRM Implication

On Wednesday, I got mail from Laura Baldwin, President of O'Reilly, announcing that "as of today, we are discontinuing fulfillment of individual book and video purchases on shop.oreilly.com. Books (both ebook and print) will still be available for sale via other digital and bricks-and-mortar retail channels...[and] of course, we will continue to publish books and videos..." So O'Reilly's not getting out of the book and video publishing business, it's just getting out of the business of selling them at retail. For details, check out Laura's blog entrythis story at Publishers Weekly or these discussions at Slashdot or Hacker News.

To me, the most interesting implication of this announcement is that O'Reilly's no-DRM policy apparently resonated little with the market. Other technical publishers I'm familiar with (e.g., Addison-Wesley, the Pragmatic Programmer, Artima) attempt to discourage illegal dissemination of copyrighted material (e.g., books in digital form) by at least stamping the buyer's name on each page. O'Reilly went the other way, trusting people who bought its goods not to give them to their friends or colleagues or to make them available on the Internet.

I don't know what motivated that policy. Perhaps it was a belief that trusting buyers was the right thing to do. But I can't help but think they took into account the effect it would likely have on sales. After all, publishing is a business.

Piracy is a double-edged sword. On the one hand, it means you receive no compensation for the benefit readers get from the work you put in. On the other hand, pirated books act as implicit marketing, expanding awareness of you and your book(s). They can also reach buyers who want to see the full product before making a purchasing decision or who wouldn't become aware of your book through conventional marketing efforts.

My feeling is that most people who choose pirated books are unlikely to pay for them, even if that's the only way to get them. As such, I'm inclined to think the marketing effect of illegal copies exceeds the lost revenue. I have no data to back me up. Maybe it's just a rationalization to help me live with the knowledge that no matter what you do, there's no way you can prevent bootleg copies of your books from showing up on the Net.

My guess is that a component of O'Reilly's no-DRM policy was a hope that it would distinguish O'Reilly from other publishers and would attract buyers who felt strongly about DRM. Whether it did that, I don't know, but O'Reilly's decision to stop selling individual products at its web site suggests that DRM (or the lack thereof) is not an important differentiator for most buyers of technical books and videos.

Wednesday, May 17, 2017

Interview with Me (in Hungarian)

Last month, I was invited to give a presentation at NNG in Budapest. During my visit to NNG, I was asked to talk with some people from HWSW, and the resulting interview has now been published. If you're comfortable with Hungarian (or with the results of a translation from Hungarian into whatever language you prefer), I encourage you to take a look.

In reading the interview, it may be helpful to know that the talk I gave at NNG was a shorter version of the presentation I gave at DConf earlier this month, "Things that Matter."

Enjoy!

Scott

Tuesday, March 14, 2017

Keynote at DConf in Berlin on May 5

The folks behind the annual conference for the D programming language offered me a soapbox for my most fundamental beliefs about software and software development, so on Friday, 5 May, I'll be speaking in Berlin at DConf about

Things That Matter

In the 45+ years since Scott Meyers wrote his first program, he’s played many roles: programmer, user, educator, researcher, consultant. Different roles beget different perspectives on software development, and so many perspectives over so much time have led Scott to strong views about the things that really matter. In this presentation, he’ll share what he believes is especially important in software and software development, and he’ll try to convince you to embrace the same ideas he does.
Because this isn't a C++ talk, I sent the DConf organizers a more general bio than I usually use. It may include some things about me you don't know, so perhaps you'll find it interesting:
Scott Meyers started programming in 1971, and he started teaching programming in 1972. He’s best known for his Effective C++ books, but he’s also worked on constraint expression for programming languages, program representations in development environments, software simulations of bacteriophage lambda, general principles for improving software quality, and the effective presentation of technical information. In 2009, he received the Dr. Dobb’s Excellence in Programming Award, and in 2014, an online poll likened his hair style to that of the cartoon character, He-Man.
If you're working with or interested in D, I encourage you to consider attending the conference. If so, be sure to stop by and say hello after my talk!

Scott

Friday, February 3, 2017

By the Numbers: The Great Foreign Edition Book Giveaway

A couple of months ago, I offered to give away foreign editions of my books, asking recipients only that they reimburse me for the postage. Here are some numbers associated with the giveaway.
  • 112: Books I had to give away.
  • 70: Books I gave away. (There were no requests for the others.)
  • 65: People who requested books.
  • 37: People I sent books to. (It wasn't possible to satisfy all requests.)
  • 13: People whose requests overlooked the requirement to include a mailing address. (Such requests were moved to the bottom of the priority list. Some still got satisfied, because they were for books for which no higher-priority requests came through. In those cases, I pinged the requesters for mailing addresses.)
  • 21: Countries to which I was asked to send books.
  • 13: Countries to which I sent books. (It still wasn't possible to satisfy all requests.)
  • 26: Requests for Effective Modern C++ in Russian (the most frequently requested book).
  • 1: Copies of Effective Modern C++ in Russian I had to give away.
  • 5: Maximum number of books sent to any single requester. (These books were in Japanese, but the mailing address was in Sweden, and the request came from someone with an email provider in Italy, so it appears that an Italian in Sweden requested books in Japanese :-}.)
  • 905.65: Total cost of postage for books I sent (in US dollars).
  • 75.4: Percent of this cost I've so far been reimbursed.
Scott

Tuesday, January 31, 2017

Updated Versions of EC++/3E and EMC++

New printings of Effective C++, Third Edition and Effective Modern C++ have recently been published by Addison-Wesley and O'Reilly, respectively. Both printings include fixes for all the errata that had been reported through December, though a couple of bug reports for EMC++ have since trickled in, sigh. For EC++/3E, the new printing is number 17. For EMC++, it's 10.

If you purchased digital copies of these books from the publisher, you should be able to log in to your account and download the latest versions. (O'Reilly customers should have received a notification to this effect. AW doesn't seem to tell people when new printings are available for download.)

If you purchase print copies of these books, I encourage you to make sure you're getting the latest versions. I have copies of the latest printings, so I know they exist in print form.

I hope you enjoy the latest revisions of these books. They should be the best versions yet.

Scott

Wednesday, December 28, 2016

New ESDS Book: Effective SQL

SQL finally gets the effective treatment. That's an accomplishment, because despite an official ISO standard for SQL, there's enough variation among common offerings that the authors of Effective SQL felt obliged to test their code (e.g., schemas, queries, etc.) on six different implementations. They also point out syntactic and semantic differences between "official" SQL and the SQL you're probably using. 

Pulling off that kind of feat calls for lots of experience, both with SQL and with explaining it to others. Authors John Viescas, Doug Steele, and Ben Clothier have it in spades. They're pushing a century of IT experience (!), and they've published more than a half-dozen books on databases, SQL, or both. It's hard to get better than that.

If you work with SQL, you owe it to yourself to take a look at Effective SQL.

Scott

Tuesday, December 27, 2016

New ESDS Book: Effective C#, Third Edition

The third incarnation of Bill Wagner's best-selling Effective C# has flown off the presses, and a copy has landed on my desk. Apparently it's flying off the shelves, too, because it's currently Amazon's #1 new release in the category of Microsoft C and C++ Windows Programming. If you'd like the book to land on your desk as well as mine, you might want to place your order quickly.

This revision of Effective C# is part one of a two-park comprehensive update Bill is undertaking for both his C# titles (the other being More Effective C#). For details on the motivation for the updates and his thinking about them, check out Bill's recent blog post.

Happy C#ing!

Scott

Effective Modern C++ in Portuguese!

The latest addition to the Effective Modern C++ family goes by C++ Moderno e Eficaz and targets readers of Portuguese. My understanding is that the book's been out for a few months, but my copy arrived only a few days ago.

Like most foreign translations of EMC++, this one uses just one ink color, so if you're comfortable with technical English, I recommend the four-color English (American) edition. However, if Portuguese descriptions of C++11 and C++14 features is your preferred cup of tea, this is the brew for you!

Scott

Sunday, November 27, 2016

The Great Foreign Edition Book Giveaway

One of the nicer author perks is seeing your books appear in translation. In my 2003 Advice to Prospective Book Authors, I wrote:
Few things evoke quite the level of giddiness as seeing a copy of your book in a foreign script. I, for one, cherished my books in Chinese, and I continued to cherish them even after I found out that they were actually in Korean.
My publishers generally send me at least one copy of each translation they authorize. I often receive several copies, however, and over the years, I've amassed  more copies of my books in foreign languages than I have use for. Look!—these are the extra copies I currently have:

Instead of letting these books gather more dust, I've decided to give them away. Want one? Just ask. I'll autograph it for you and throw it in the mail, and all I'll request in return is that you cover the cost of postage.

I'll describe the details of how the giveaway works in a moment, but first let me show you the available inventory. Most books are in a language other than English, but what I'm technically giving away are foreign editions, so a few have the same text as the US book (i.e., they're in English). Such editions are generally printed on cheaper paper than their US counterparts, and like almost all the books I'm giving away, they use only one ink color, even if the US version uses multiple colors.

Here's what I've got:

Things to bear in mind:

  • For books with two ISBN lines, each line represents a distinct ISBN for the book. The upper one is the older ISBN-10. The lower one is the newer ISBN-13. (ISBN-10 vs ISBN-13 is the publishing equivalent of IPv4 vs. IPv6.)
  • Sometimes there are multiple versions of the same translation, e.g., there are two entries for German and for Japanese translations of Effective C++, Third Edition. In such cases, the only difference is typically the cover design. As far as I know, the substance of all translations of a particular book into a particular language is the same.
  • In the table, "Chinese" is ambiguous, because there are two versions of printed Chinese: traditional and simplified. To find out which Chinese is meant, use your favorite search engine to look up a book's ISBN.
  • I've tried to list accurate languages for the books, but, not being able to read most of them, I may have made a mistake here and there. If so, I apologize, and I hope you'll bring the errors to my attention.
  • The first two editions of Effective C++ are either old or really old. Both are out of date. They might be suitable for a C++ museum, or maybe you could employ them as research material for that Scott Meyers biography you've been working on (ahem), but the programming advice in these editions is not to be trusted. I'll send them to you if you ask me to, but before you make a request, think carefully about why you're doing it. It shouldn't be to improve your C++.

How the giveaway works:

  • If you'd like a book, send me email letting me know what you want and the address to which I should send it. If you'd like more than one book, that's fine, just list the books in priority order. (I'll ignore book requests posted as comments to this blog, sorry.)
  • I'll let the requests roll in for about two weeks (until about December 9), then I'll decide who gets what on whatever basis I want. My general plan is to assign higher priority to earlier requests and to issue everybody one book before issuing anybody more than one (i.e., to use a pseudo-FIFO pseudo-round-robin algorithm), but my plan might change. If your request includes an unusually good reason to satisfy it, I'll increase your priority. (An example of an unusually good reason would be that you'd like books to stock a library, thus making them available to many people.)
  • At some point (by December 16, I hope), I'll let you know whether I can satisfy your request. If I can, I'll put your book(s) in the mail, let you know how much the postage is, and request that you send me that much by Paypal. As it happens, I've gone down this road a couple of times in the past, and some of the promised payments never materialized. Nevertheless, my faith in the basic honesty of C++ software developers endures. I'd appreciate it if you wouldn't do anything to change that.
Soooo...who wants a book that I can't read, that's out of date, or both?

Scott

Monday, November 21, 2016

Help me sort out the meaning of "{}" as a constructor argument

In Effective Modern C++, one of the explanations I have in Item 7 ("Distinguish between () and {} when creating objects") is this:
If you want to call a std::initializer_list constructor with an empty std::initializer_list, you do it by making the empty braces a constructor argument—by putting the empty braces inside the parentheses or braces demarcating what you’re passing:
  
class Widget {
public:
  Widget();                                   // default ctor
  Widget(std::initializer_list<int> il);      // std::initializer_list ctor
  …                                           // no implicit conversion funcs
}; 

Widget w1;          // calls default ctor
Widget w2{};        // also calls default ctor
Widget w3();        // most vexing parse! declares a function!    

Widget w4({});      // calls std::initializer_list ctor with empty list
Widget w5{{}};      // ditto  
I recently got a bug report from Calum Laing saying that in his experience, the initializations of w4 and w5 aren't equivalent, because while w4 behaves as my comment indicates, the initialization of w5 takes place with a std::initializer_list with one element, not zero.

A little playing around showed that he was right, but further playing around showed that changing the example in small ways changed its behavior. In my pre-retirement-from-C++ days, that'd have been my cue to dive into the Standard to figure out what behavior was correct and, more importantly, why, but now that I'm supposed to be kicking back on tropical islands and downing piña coladas by the bucket (a scenario that would be more plausible if I laid around on beaches...or drank), I decided to stop my research at the point where things got complicated. "Use the force of the Internet!," I told myself. In that spirit, let me show you what I've got in the hope that you can tell me why I'm getting it. (Maybe it's obvious. I really haven't thought a lot about C++ since the end of last year.)

My experiments showed that one factor affecting whether "{{}}" as an argument list yields a zero-length std::initializer_list<T> was whether T had a default constructor, so I threw together some test code involving three classes, two of which could not be default-constructed. I then used both "({})" (note the outer parentheses) and "{{}}" as argument lists to a constructor taking a std::initializer_list for a template class imaginatively named X. When the constructor runs, it displays the number of elements in its std::initializer_list parameter.

Here's the code, where the comments in main show the results I got under all of gcc, clang, and vc++ at rextester.com.  Only one set of results is shown, because all three compilers produced the same output.
  
#include <iostream>
#include <initializer_list>

class DefCtor {
public:
  DefCtor(){}
};

class DeletedDefCtor {
public:
  DeletedDefCtor() = delete;
};

class NoDefCtor {
public:
  NoDefCtor(int){}
};

template<typename T>
class X {
public:
  X() { std::cout << "Def Ctor\n"; }
    
  X(std::initializer_list<T> il)
  {
    std::cout << "il.size() = " << il.size() << '\n';
  }
};

int main()
{
  X<DefCtor> a0({});           // il.size = 0
  X<DefCtor> b0{{}};           // il.size = 1
    
  X<DeletedDefCtor> a2({});    // il.size = 0
  X<DeletedDefCtor> b2{{}};    // il.size = 1

  X<NoDefCtor> a1({});         // il.size = 0
  X<NoDefCtor> b1{{}};         // il.size = 0
}
These results raise two questions:
  1. Why does the argument list syntax "{{}}" yield a one-element std::initializer_list for a type with a default constructor, but a zero-element std::initializer_list for a type with no default constructor?
  2. Why does a type with a deleted default constructor behave like a type with a default constructor instead of like a type with no default constructor?
If I change the example to declare DefCtor's constructor explicit, clang and vc++ produce code that yields a zero-length std::initializer_list, regardless of which argument list syntax is used:
class DefCtor {
public:
  explicit DefCtor(){}             // now explicit
};

...

X<DefCtor> a0({});           // il.size = 0
X<DefCtor> b0{{}};           // il.size = 0 (for clang and vc++)  
However, gcc rejects the code:
source_file.cpp:35:19: error: converting to ‘DefCtor’ from initializer list would use explicit constructor ‘DefCtor::DefCtor()’
   X<DefCtor> b0{{}};
                   ^
gcc's error message suggests that it may be trying to construct a DefCtor from an empty std::initializer_list in order to move-construct the resulting temporary into b0. If that's what it's trying to do, and if that's what compilers are supposed to do, the example would become more complicated, because it would mean that what I meant to be a series of single constructor calls may in fact include calls that create temporaries that are then used for move-constructions.

We thus have two new questions:
  1. Is the code valid if DefCtor's constructor is explicit?
  2. If so (i.e., if clang and vc++ are correct and gcc is incorrect), why does an explicit constructor behave differently from a non-explicit constructor in this example? The constructor we're dealing with doesn't take any arguments.
The natural next step would be to see what happens when we declare the constructors in DeletedDefCtor and/or NoDefCtor explicit, but my guess is that once we understand the answers to questions 1-4, we'll know enough to be able to anticipate (and verify) what would happen. I hereby open the floor to explanations of what's happening such that we can answer the questions I've posed. Please post your explanations in the comments!

---------- UPDATE ----------

As several commenters pointed out, in my code above, DeletedDefCtor is an aggregate, which is not what I intended. Here's revised code that eliminates that. With this revised code, all three compilers yield the same behavior, which, as noted in the comment in main below, includes failing to compile the initialization for b2. (Incidentally, I apologize for the 0-2-1 ordering of the variable names. They were originally in a different order, but I moved them around to make the example clearer, then forgot to rename them, thus rendering the example probably more confusing, sigh.)
  
#include <iostream>
#include <initializer_list>
 
class DefCtor {
  int x;
public:
  DefCtor(){}
};
 
class DeletedDefCtor {
  int x;
public:
  DeletedDefCtor() = delete;
};
 
class NoDefCtor {
  int x;    
public:
  NoDefCtor(int){}
};
 
template<typename T>
class X {
public:
  X() { std::cout << "Def Ctor\n"; }
     
  X(std::initializer_list<T> il)
  {
    std::cout << "il.size() = " << il.size() << '\n';
  }
};
 
int main()
{
  X<DefCtor> a0({});           // il.size = 0
  X<DefCtor> b0{{}};           // il.size = 1
     
  X<DeletedDefCtor> a2({});    // il.size = 0
  // X<DeletedDefCtor> b2{{}};    // error! attempt to use deleted constructor
 
  X<NoDefCtor> a1({});         // il.size = 0
  X<NoDefCtor> b1{{}};         // il.size = 0
}
This revised code renders question 2 moot.

The revised code exhibits the same behavior as the original code when DefCtor's constructor is declared explicit: gcc rejects the initialization of b0, but clang and vc++ accept it and, when the code is run, il.size() produces 0 (instead of the 1 that's produced when the constructor is not explicit).

---------- RESOLUTION ----------

Francisco Lopes, the first person to post comments on this blog post, described exactly what was happening as regards questions 1 and 2 about the original code I posted. The only thing he didn't do was cite sections of the Standard, which I can hardly fault him for. From my perspective, the key provisions in the C++14 Standard are
  • 13.3.1.7 ([over.match.list]), which says that when you have a braced initializer for an object, you first try to treat the entire initializer as an argument to a constructor taking a std::initializer_list. If that doesn't yield a valid call, you fall back on viewing the contents of the braced initializer as constructor arguments and perform overload resolution again.
and
  • 8.5.4/5 ([dcl.init.list]/5), which says that if you're initializing a std::initializer_list from a braced initializer, you copy-initialize each element of the std::initializer_list from the corresponding element of the braced initializer. The relevance of this part of the Standard was brought to my attention by Marco Alesiani in his comment below.
The behavior of the initializations of a0 and b0, then, can be explained as follows:
  
X<DefCtor> a0({});  // The arg list uses parens, not braces, so the only ctor argument is
                    // "{}", which, per 13.3.3.1.5/2 ([over.ics.list]/2) becomes an empty
                    // std::initializer_list. (Thanks to tcanens at reddit for the 
                    // reference to 13.3.3.1.5.)

X<DefCtor> b0{{}};  // The arg list uses braces, so the ctor argument is "{{}}", which is
                    // an initializer list with one element, "{}". DefCtor can be
                    // copy-initialized from "{}", so the ctor's std::initializer_list
                    // param contains a single default-constructed DefCtor object.
I thus understand the error in Effective Modern C++ that Calum Laing brought to my attention. The information in the comments (and in this reddit subthread) regarding how explicit constructors affect things is just a bonus.

Thanks to everybody for helping me understand what was going on. All I have to do now is figure out how to use this newfound understanding to fix the problem in the book...

Wednesday, November 9, 2016

Test Post -- Please Ignore

This is test content. Please ignore.

Monday, August 8, 2016

Interview with Me (in Korean)

My keynote address at NDC in Seoul got the Korean tech press interested in talking to me, and the interview Jihyun Lee conducted has now been published at Bloter.

As a rule, I read through my interviews before blogging about their existence, because, hey?!, who knows what I said? But since the interview is published in Korean, I skipped that step. If you read Korean as easily as you read C++, I hope you enjoy the interview. If you enjoy it enough to translate it into English (or if you find a translation floating around the Internet somewhere), please let me know.

Scott

Tuesday, June 21, 2016

Effective Modern C++ in Traditional Chinese!

Yesterday I received an interesting-looking box in the mail. The contents were even more interesting: the translation of Effective Modern C++ into Traditional Chinese!

This translation uses only one ink color (black), so if you're comfortable with technical English, you're probably better off with the English (American) edition.  If you prefer your C++ with a traditional Chinese flair, however, this new edition is the one for you.

EMC++ has now been translated into the following languages:
  • German
  • Italian
  • Polish
  • Japanese
  • Korean
  • Russian
  • French
  • Traditional Chinese
My understanding is that translations into Portuguese and Simplified Chinese are also in the works. If you're aware of other translations, please let me know.

In the meantime, enjoy the new Chinese translation of EMC++.

Scott

Monday, April 25, 2016

Thursday's NDC Presentation will be live, but remote

Recent developments have conspired to prevent me from attending this week's Nexon Developers Conference in Seoul, but I'll still be making my keynote presentation, "Modern C++ Beyond the Headlines." The talk will be live, but I'll be at home instead of in the conference hall. The heavy lifting on the communications front will be handled by Skype.

The keynote will take place at  5:05PM local time at the conference, which will be 1:05AM local time for me. It should be interesting to see who suffers more: the conference attendees at the end of a long day or me at the end of a longer one :-)

Scott

Monday, April 4, 2016

Presentation at Nexon Developers Conference in Seoul on April 28

In my "retirement from active involvement in C++" post at the end of last year, I wrote:
I may even give one more talk. (A potential conference appearance has been in the works for a while. If it gets scheduled, I'll let you know.)
Well, it's been scheduled, and I'm letting you know: I'll be giving a presentation at the Nexon Developers Conference in Seoul on April 28. The topic is "Modern C++ Beyond the Headlines," and I plan to talk about how some features in C++11/14 are better than they appear at first glance (e.g., constexpr), while others are likely to be less attractive than they initially seem (e.g., emplacement).

There are no talks in the pipeline after this one, and I've been holding fast on my decision not to accept new engagements, so in all likelihood, this is the last C++ presentation I'll make. If you want to be there to see if I botch the landing, the Nexon Developers Conference at the end of the month is the place to be!

Scott

Monday, March 28, 2016

Effective Modern C++ in French!

Et Voilà! The French edition of Effective Modern C++ has just arrived at my desk, so it should be available for you, too.

This version of the book uses only one ink color (black), so if you're comfortable with technical English, I suspect you'll prefer the four-color English (American) edition. But if you like your C++ in French (including the code comments!), this new edition is your ami.

Scott

Thursday, December 31, 2015

} // good to go

Okay, let's see what we've got. Two sets of annotated training materials. Six books. Over four dozen online videos. Some 80 articles, interviews, and academic papers. A slew of blog entries, and more posts to Usenet and StackOverflow than you can shake a stick at. A couple of contributions to the C++ vernacular. A poll equating my hair with that of a cartoon character.

I think that's enough; we're good to go. So consider me gone. 25 years after publication of my first academic papers involving C++, I'm retiring from active involvement with the language.

It's a good time for it. My job is explaining C++ and how to use it, but the C++ explanation biz is bustling. The conference scene is richer and more accessible than ever before, user group meetings take place worldwide, the C++ blogosphere grows increasingly populous, technical videos cover everything from atomics to zero initialization, audio podcasts turn commute-time into learn-time, and livecoding makes it possible to approach C++ as a spectator sport. StackOverflow provides quick, detailed answers to programming questions, and the C++ Core Guidelines aim to codify best practices. My voice is dropping out, but a great chorus will continue.

Anyway, I'm only mostly retiring from C++. I'll continue to address errata in my books, and I'll remain consulting editor for the Effective Software Development Series. I may even give one more talk. (A potential conference appearance has been in the works for a while. If it gets scheduled, I'll let you know.)

"What's next?," you may wonder. I get that a lot. I've spent the last quarter century focusing almost exclusively on C++, and that's caused me to push a lot of other things to the sidelines. Those things now get a chance to get off the bench. 25 years of deferred activities begets a pretty long to-do list. The topmost entry? Stop trying to monitor everything in the world of C++ :-)

Scott


Friday, December 4, 2015

Effective Modern C++ in Russian!

I haven't yet received a copy, but I have received word that there's now a Russian translation of Effective Modern C++. For details, please consult this page.

C++ in Cyrillic! What could be finer?

Scott

Tuesday, November 17, 2015

The Brick Wall of C++ Source Code Transformation

In 1992, I was responsible for organizing the Advanced Topics Workshop that accompanied the USENIX C++ Technical Conference. The call for workshop participation said:
The focus of this year's workshop will be support for C++ software development tools. Many people are beginning to experiment with the idea of having such tools work off a data structure that represents parsed C++, leaving the parsing task to a single specialized tool that generates the data structure. 
As the workshop approached, I envisioned great progress in source code analysis and transformation tools for C++. Better lints, deep architectural analysis tools, automatic code improvement utilities--all these things would soon be reality! I was very excited.

By the end of the day, my mood was different. Regardless of how we approached the problem of automated code comprehension, we ran into the same problem: the preprocessor. For tools to understand the semantics of source code, they had to examine the code after preprocessing, but to produce acceptable transformed source code, they had to modify what programmers work on: files with macros unexpanded and preprocessor directives intact. That means tools had to map from preprocessed source files back to unpreprocessed source files. That's challenging even at first glance, but when you look closer, the problem gets harder. I found out that some systems #include a header file, modify preprocessor symbols it uses, then #include the header again--possibly multiple times. Imagine back-mapping from preprocessed source files to unpreprocessed source files in such systems!

Dealing with real C++ source code means dealing with real uses of the preprocessor, and at that workshop nearly a quarter century ago, I learned that real uses of the preprocessor doomed most tools before they got off the drawing board. It was a sobering experience.

In the ensuing 23 years, little has changed. Tools that transform C++ source code still have to deal with the realities of the preprocessor, and that's still difficult. In my last blog post, I proposed that the C++ Standardization Committee take into account how source-to-source transformation tools could reduce the cost of migrating old code to new standards, thus permitting the Committee to be more aggressive about adopting breaking changes to the language. In this post, I simply want to acknowledge that preprocessor macros make the development of such tools harder than my last post implied.

Consider this very simple C++:
#define ZERO 0

auto x = ZERO;
int *p = ZERO;
In the initialization of x, ZERO means the int 0. In the initialization of p, ZERO means the null pointer. What should a source code transformation tool do with this code if its job is to replace all uses of 0 as the null pointer with nullptr? It can't change the definition of ZERO to nullptr, because that would change the semantics of the initialization of x. It could, I suppose, get rid of the macro ZERO and replace all uses with either the int 0 or nullptr, depending on context, but (1) that's really outside its purview (programmers should be the ones to determine if macros should be part of the source code, not tools whose job it is to nullptr-ify a code base), and (2) ZERO could be used inside other macros that are used inside other macros that are used inside other macros..., and especially in such cases, reducing the macro nesting could fill the transformed source code with redundancies and make it harder to maintain. (It'd be the moral equivalent of replacing all calls to inline functions with the bodies of those functions.)

I don't recall a lot of talk about templates at the workshop in 1992. At that time, few people had experience with them. (The first compiler to support them, cfront 3.0, was released in 1991.) Nevertheless, templates can give rise to the same kinds of problems as the preprocessor:
template<typename T>
void setToZero(T& obj) { obj = 0; }

int x;
setToZero(x);    // "0" in setToZero means the int

int *p;
setToZero(p);    // "0" in setToZero means the null pointer
I was curious about what clang-tidy did in these situations (one of its checks is modernize-use-nullptr), but I was unable to find a way to enable that check in the version of clang-tidy I downloaded (LLVM version 3.7.0svn-r234109). Not that it matters. The way that clang-tidy approaches the problem isn't the only way, and one of the reasons I propose a decade-long time frame to go from putting a language feature on a hit list to actually getting rid of it is that it's likely to take significant time to develop source-to-source translation tools that can handle production C++ code, macros and templates and all.

The fact that the problem is hard doesn't mean it's insurmountable. The existence of refactoring tools like Clang-tidy (far from the only example of such tools) demonstrates that industrial-strength C++ source transformation tools can be developed. It's nonetheless worth noting that such tools have to take the existence of templates and the preprocessor into account, and those are noteworthy complicating factors.

-- UPDATE --

A number of comments on this post include references to tools that chip away at the problems I describe here. I encourage you to pursue those references. As I said, the problem is hard, not insurmountable.

Friday, November 13, 2015

Breaking all the Eggs in C++

If you want to make an omelet, so the saying goes, you have to break a few eggs. Think of the omelet you could make if you broke not just a few eggs, but all of them! Then think of what it'd be like to not just break them, but to replace them with newer, better eggs. That's what this post is about: breaking all the eggs in C++, yet ending up with better eggs than you started with.

NULL, 0, and nullptr

NULL came from C. It interfered with type-safety (it depends on an implicit conversion from void* to typed pointers), so C++ introduced 0 as a better way to express null pointers. That led to problems of its own, because 0 isn't a pointer, it's an int. C++11 introduced nullptr, which embodies the idea of a null pointer better than NULL or 0. Yet NULL and 0-as-a-null-pointer remain valid. Why? If nullptr is better than both of them, why keep the inferior ways around?

Backward-compatibility, that's why. Eliminating NULL and 0-as-a-null-pointer would break existing programs. In fact, it would probably break every egg in C++'s basket. Nevertheless, I'm suggesting we get rid of NULL and 0-as-a-null-pointer, thus eliminating the confusion and redundancy inherent in having three ways to say the same thing (two of which we discourage people from using).

But read on.

Uninitialized Memory

If I declare a variable of a built-in type and I don't provide an initializer, the variable is sometimes automatically set to zero (null for pointers). The rules for when "zero initialization" takes place are well defined, but they're a pain to remember. Why not just zero-initialize all built-in types that aren't explicitly initialized, thus eliminating not only the pain of remembering the rules, but also the suffering associated with debugging problems stemming from uninitialized variables?

Because it can lead to unnecessary work at runtime. There's no reason to set a variable to zero if, for example, the first thing you do is pass it to a routine that assigns it a value.

So let's take a page out of D's book (in particular, page 30 of The D Programming Language) and zero-initialize built-ins by default, but specify that void as an initial value prevents initialization:
int x;              // always zero-initialized
int x = void;       // never zero-initialized
The only effect such a language extension would have on existing code would be to change the initial value of some variables from indeterminate (in cases where they currently would not be zero-initialized) to specified (they would be zero-initialized). That doesn't lead to any backward-compatibility problems in the traditional sense, but I can assure you that some people will still object. Default zero initialization could lead to a few more instructions being executed at runtime (even taking into account compilers' ability to optimize away dead stores), and who wants to tell  developers of a finely-tuned safety-critical realtime embedded system (e.g., a pacemaker) that their code might now execute some instructions they didn't plan on?

I do. Break those eggs!

This does not make me a crazy man. Keep reading.

std::list::remove and std::forward_list::remove

Ten standard containers offer a member function that eliminates all elements with a specified value (or, for map containers, a specified key): list, forward_list, set, multiset, map, multimap, unordered_set, unordered_multiset, unordered_map, unordered_multimap. In eight of these ten containers, the member function is named erase. In list and forward_list, it's named remove. This is inconsistent in two ways. First, different containers use different member function names to accomplish the same thing. Second, the meaning of "remove" as an algorithm is different from that as a container member function: the remove algorithm can't eliminate any container elements, but the remove member functions can.

Why do we put up with this inconsistency? Because getting rid of it would break code. Adding a new erase member function to list and forward_list would be easy enough, and it would eliminate the first form of inconsistency, but getting rid of the remove member functions would render code calling them invalid. I say scramble those eggs!

Hold your fire. I'm not done yet.

override

C++11's override specifier enables derived classes to make explicit which functions are meant to override virtual functions inherited from base classes. Using override makes it possible for compilers to diagnose a host of overriding-relating errors, and it makes derived classes easier for programmers to understand. I cover this in my trademark scintillating fashion (ahem) in Item 12 of Effective Modern C++, but in a blog post such as this, it seems tacky to refer to something not available online for free, and that Item isn't available for free--at least not legally. So kindly allow me to refer you to this article as well as this StackOverflow entry for details on how using override improves your code.

Given the plusses that override brings to C++, why do we allow overriding functions to be declared without it? Making it possible for compilers to check for overriding errors is nice, but why not require that they do it? It's not like we make type checking optional, n'est-ce pas?

You know where this is going. Requiring that overriding functions be declared override would cause umpty-gazillion lines of legacy C++ to stop compiling, even though all that code is perfectly correct. If it ain't broke, don't fix it, right? Wrong!, say I. Those old functions may work fine, but they aren't as clear to class maintainers as they could be, and they'll cause inconsistency in code bases as newer classes embrace the override lifestyle. I advocate cracking those eggs wide open.

Backward Compatibility 

Don't get me wrong. I'm on board with the importance of backward compatibility. Producing software that works is difficult and expensive, and changing it is time-consuming and error-prone. It can also be dangerous. There's a reason I mentioned pacemakers above: I've worked with companies who use C++ as part of pacemaker systems. Errors in that kind of code can kill people. If the Standardization Committee is going to make decisions that outlaw currently valid code (and that's what I'd like to see it do), it has to have a very good reason.

Or maybe not. Maybe a reason that's merely decent suffices as long as existing code can be brought into conformance with a revised C++ specification in a way that's automatic, fast, cheap, and reliable. If I have a magic wand that allows me to instantly and flawlessly take all code that uses NULL and 0 to specify null pointers and revises the code to use nullptr instead, where's the downside to getting rid of NULL and 0-as-a-null-pointer and revising C++ such that the only way to specify a null pointer is nullptr? Legacy code is easily updated (the magic wand works instantly and flawlessly), and we don't have to explain to new users why there are three ways to say the same thing, but they shouldn't use two of them. Similarly, why allow overriding functions without override if the magic wand can instantly and flawlessly add override to existing code that lacks it?

The eggs in C++ that I want to break are the old ways of doing things--the ones the community now acknowledges should be avoided. NULL and 0-as-a-null-pointer are eggs that should be broken. So should variables with implicit indeterminate values. list::remove and forward_list::remove need to go, as do overriding functions lacking override. The newer, better eggs are nullptr, variables with indeterminate values only when expressly requested, list::erase and forward_list::erase, and override. 

All we need is a magic wand that works instantly and flawlessly.

In general, that's a tall order, but I'm willing to settle for a wand with limited abilities. The flawless part is not up for negotiation. If the wand could break valid code, people could die. Under such conditions, it'd be irresponsible of the Standardization Committee to consider changing C++ without the above-mentioned very good reason. I want a wand that's so reliable, the Committee could responsibly consider changing the language for reasons that are merely decent.

I'm willing to give ground on instantaneousness. The flawless wand must certainly run quickly enough to be practical for industrial-sized code bases (hundreds of millions of lines or more), but as long as it's practical for such code bases, I'm a happy guy. When it comes to speed, faster is better, but for the speed of the magic wand, good enough is good enough.

The big concession I'm willing to make regards the wand's expressive power. It need not perform arbitrary changes to C++ code bases. For Wand 1.0, I'm willing to settle for the ability to make localized source code modifications that are easy to algorithmically specify. All the examples I discussed above satisfy this constraint:
  • The wand should replace all uses of NULL and of 0 as a null pointer with nullptr. (This alone won't make it possible to remove NULL from C++, because experience has shown that some code bases exhibit "creative" uses of NULL, e.g., "char c = (char) NULL;". Such code typically depends on undefined behavior, so it's hard to feel too sympathetic towards it, but that doesn't mean it doesn't exist.)
  • The wand should replace all variable definitions that lack explicit initializers and that are currently not zero-initialized with an explicit initializer of void. 
  • The wand should replace uses of list::remove and forward_list::remove with uses of list::erase and forward_list::erase. (Updating the container classes to support the new erase member functions would be done by humans, i.e. by STL implementers. That's not the wand's responsibility.)
  • The wand should add override to all overriding functions.
Each of the transformations above are semantics-preserving: the revised code would have exactly the same behavior under C++ with the revisions I've suggested as it currently does under C++11 and C++14.

Clang

The magic wand exists--or at least the tool needed to make it does. It's called Clang. All hail Clang! Clang parses and performs semantic analysis on C++ source code, thus making it possible to write tools that modify C++ programs. Two of the transformations I discussed above appear to be part of clang-tidy (the successor to clang-modernize): replacing NULL and 0 as null pointers with nullptr and adding override to overriding functions. That makes clang-tidy, if nothing else, a proof of concept. That has enormous consequences.

Revisiting Backward Compatibility 

In recent years, the Standardization Committee's approach to backward compatibility has been to preserve it at all costs unless (1) it could be demonstrated that only very little code would be broken and (2) the cost of the break was vastly overcompensated for by a feature enabled by the break. Hence the Committee's willingness to eliminate auto's traditional meaning in C and C++98 (thus making it possible to give it new meaning in C++11) and its C++11 adoption of the new keywords alignas, alignof, char16_t, char32_t, constexpr, decltype, noexcept, nullptr, static_assert, and thread_local.

Contrast this with the perpetual deprecation of setting bool variables to true by applying ++ to them. When C++14 was adopted, that construct had been deprecated for some 17 years, yet it remains part of C++. Given its lengthy stint on death row, it's hard to imagine that a lot of code still depends on it, but my guess is that the Committee sees nothing to be gained by actually getting rid of the "feature," so, failing part (2) of the break-backward-compatibility test, they leave it in.

Incidentally, code using ++ to set a bool to true is another example of the kind of thing that a tool like clang-tidy should be able to easily perform. (Just replace the use of ++ with an assignment from true.)

Clang makes it possible for the Standardization Committee to retain its understandable reluctance to break existing code without being quite so conservative about how they do it. Currently, the way to avoid breaking legacy software is to ensure that language revisions don't affect it. The sole tool in the backward-compatibility toolbox is stasis: change nothing that could affect old code. It's a tool that works, and make no mistake about it, that's important. The fact that old C++ code continues to be valid in modern C++ is a feature of great importance to many users. It's not just the pacemaker programmers who care about it.

Clang's contribution is to give the Committee another way to ensure backward compatibility: by recognizing that tools can be written to automatically modify old code to conform to revised language specifications without any change in semantics. Such tools, provided they can be shown to operate flawlessly (i.e., they never produce transformed programs that behave any differently from the code they're applied to) and at acceptable speed for industrial-sized code bases, give the Standardization Committee more room to get rid of the parts of C++ where there's consensus that we'd rather not have them in the language.

A Ten-Year Process

Here's how I envision this working:
  • Stage 1a: The Standardization Committee identifies features of the language and/or standard library that they'd like to get rid of and whose use they believe can be algorithmically transformed into valid and semantically equivalent code in the current version or a soon-to-be-adopted version of C++. They publish a list of these features somewhere. The Standard is probably not the place for this list. Perhaps a technical report would be a suitable avenue for this kind of thing. 
  • Stage 1b: Time passes, during which the community has the opportunity to develop tools like clang-tidy for the features identified in Stage 1a and to get experience with them on nontrivial code bases. As is the case with compilers and libraries, the community is responsible for implementing the tools, not the Committee.
  • Stage 2a: The Committee looks at the results of Stage 1b and reevaluates the desirability and feasibility of eliminating the features in question. For the features where they like what they see, they deprecate them in the next Standard.
  • Stage 2b: More time passes. The community gets more experience with the source code transformation tools needed to automatically convert bad eggs (old constructs) to good ones (the semantically equivalent new ones).
  • Stage 3: The Committee looks at the results of Stage 2b and again evaluates the desirability and feasibility of eliminating the features they deprecated in Stage 2a. Ideally, one of the things they find is that virtually all code that used to employ the old constructs has already been converted to use the new ones. If they deem it appropriate, they remove the deprecated features from C++. If they don't, they either keep them in a deprecated state (executing the moral equivalent of a goto to Stage 2b) or they eliminate their deprecated status. 
I figure that the process of getting rid of a feature will take about 10 years, where each stage takes about three years. That's based on the assumption that the Committee will continue releasing a new Standard about every three years.

Ten years may seem like a long time, but I'm not trying to optimize for speed. I'm simply trying to expand the leeway the Standardization Committee has in how they approach backward compatibility. Such compatibility has been an important factor in C++'s success, and it will continue to be so.

One Little Problem

The notion of algorithmically replacing one C++ construct with a different, but semantically equivalent, construct seems relatively straightforward, but that's only because I haven't considered the biggest, baddest, ruins-everythingest aspect of the C++-verse: macros. That's a subject for a post of its own, and I'll devote one to it in the coming days. [The post now exists here.] For now, I'm interested in your thoughts on the ideas above.

What do you think?

Saturday, October 31, 2015

Effective Modern C++ in Korean!

The latest translation to reach my door is another two-color version, this time in Korean. Knowing no Korean, I can't assess the quality of the translation, but I can say that during the translation process, the Korean publisher found an error in the index. That's a rare event—one that indicates that the translator and publisher were paying very close attention. I take that as a good sign.

I hope you enjoy EMC++ in Korean.

Scott

PS - O'Reilly and I fixed the indexing error in the latest release of the English edition of the book, so it's not just Korean readers who will benefit from the book's newest translation.

Friday, October 23, 2015

Effective Modern C++ in Japanese!

Another day, another translation of Effective Modern C++--this time in Japanese.

Unlike the other translations I've seen, the Japanese edition uses two colors, so it's closer in appearance to the four-color American edition. That's a nice feature. A notable difference between the Japanese translation and the English original, however, is that the Japanese version uses a lot more Kanji :-)

I hope my Japanese readers enjoy this new translation. I certainly enjoy having a copy on my bookshelf.

Scott

Tuesday, September 15, 2015

Effective Modern C++ in Polish!

The family of Effective Modern C++ translations continues to grow. The latest member is in Polish.

As with all the translations I've seen so far, the Polish edition uses only one ink color (black). I therefore believe that if you're comfortable with technical English, you'll probably prefer the English (American) edition. If you prefer your C++ in Polish (including code comments!), however, I'm pleased to report that you now have that option.

Scott

Should you be using something instead of what you should use instead?

The April 2000 C++ Report included an important article by Matt Austern: "Why You Shouldn't Use set—and What to Use Instead." It explained why lookup-heavy applications typically get better performance from applying binary search to a sorted std::vector than they'd get from the binary search tree implementation of a std::set. Austern reported that in a test he performed on a Pentium III using containers of a million doubles, a million lookups in a std::set took nearly twice as long as in a sorted std::vector.

Austern also reported that the std::set used nearly three times as much memory as the std::vector. On Pentium (a 32-bit architecture), doubles are 8 bytes in size, so I'd expect a std::vector storing a million of them to require about 7.6MB. Storing the same million doubles in a std::set would call for allocation of a million nodes in the underlying binary tree, and assuming three 4-byte pointers per node (pointer to left child, pointer to right child, pointer to parent), each node would take 20 bytes. A million of them would thus require about 19MB. That's only 2.5 times the memory required for the std::vector, but in 2000, I believe it was fairly common for each dynamic memory allocation to tack on a word indicating the size of the allocated storage, and if that was the case in Austern's test, each tree node would require 24 bytes—precisely three times the memory needed to store a single double in a std::vector.

The difference in memory utilization explains why searching a sorted std::vector can be faster than searching a set::set holding the same data. In a std::vector, the per-element data structure overhead present in a search tree is missing, so more container elements fit onto a memory page or into a cache line. We'd thus expect fewer page faults and/or cache misses when looking things up in a std::vector, and faster memory accesses means faster lookups.

Many people were influenced by Austern's article and by independent experiments bolstering his conclusions. I was among them. Item 23 of Effective STL is "Consider replacing associative containers with sorted vectors." Boost was convinced, too: Boost.Container offers the flat_(multi)map/set associative containers, citing as inspiration both Austern's article and the discussion of the sorted std::vector-based AssocVector in Andrei Alexandrescu's Modern C++ Design. (In his book, Alexandrescu references the C++ Report article. All roads in this area lead to Matt Austern.)

There's nothing in the article about containers based on hash tables (i.e., the standard unordered containers), presumably because there were no hash-table-based containers in the C++ standard library in 2000. Nevertheless, the same basic reasoning would seem to apply. Hash tables are based on nodes, and node-based containers incur overhead for pointers and dynamic allocations that std::vectors avoid.

On the other hand, hash tables offer O(1) lookup complexity, while sorted std::vectors offer only O(lg n). This suggests that there should be a point at which the more conservative memory usage of a std::vector is compensated for by the superior computational complexity of an unordered container.

After a recent presentation I gave discussing these issues, Paul Beerkens showed me data he'd collected showing that on simple tests, the unordered containers (i.e., hash tables) essentially always outperformed sorted std::vectors for container sizes beyond about 50. I was surprised that the crossover point was so low, so I did some testing of my own. My results were consistent with his. Here's the data I got for containers between size 20 and 1000; the X-axis is container size, the Y-axis is average lookup time:
The lines snaking across the bottom of the graph (click the image to see a larger version) are for the unordered containers. The other lines are for the tree-based containers (std::set and std::map) and for Boost's flat containers (i.e., sorted std::vectors). For containers in this size range, the superiority of the hash tables is clear, but the advantage of the flat containers over their tree-based counterparts isn't. (They actually tend to be a bit slower.) If we bump the maximum container size up to 20,000, that changes:
Here it's easy to see that for containers with about 5000 elements or more, lookups in the flat containers are faster than those in the binary trees (though still slower than those in the hash tables), and that remains true for containers up to 10 million elements (the largest I tested):
For very small containers (no more than 100 elements), Beerkens added a different strategy to the mix: linear search through an unsorted std::vector. He found that this O(n) approach performed better than everything else for container sizes up to about 35 elements, a result that's consistent with the conventional wisdom that, for small data sets, linear search runs faster than more complicated hash-based and binary-search-based algorithms.

The graphs above correspond to tests I ran on an Intel Core i7-820 using GCC 5.1.0 and MinGW under 64-bit Windows 7. Optimization was set to -O3. I ran the same tests using Visual C++ 2015's compiler, but for reasons I have yet to investigate, all the timings for the hash tables were zero under that compiler.  I've therefore omitted data for VC++. Interestingly, the code to perform linear searches through unsorted std::vectors took zero time under GCC, though non-zero time under VC++. This is why I show no data comparing lookup speeds for all of  binary search trees, hash tables, sorted std::vectors, and unsorted std::vectors: neither GCC nor VC++ generated non-zero lookup times for all approaches.

Maybe GCC optimized out the loop doing the lookups in the unsorted std::vectors, while VC++ optimized away the loops doing the lookups in the hash tables. Or maybe my test code is flawed. I don't know. Perhaps you can figure it out: here's the test code. (It's based on code Paul Beerkens shared with me, but I made substantial changes, so if there's something wrong with the test code, it's my fault.) Feel free to play around with it. Let me know what you find out, either about the code or about the results of running it.

If Paul Beerkens' and my results are valid and generalize across hardware, compilers, standard library implementations, and types of data being looked up, I'd conclude that the unordered standard associative containers (i.e., hash tables) should typically be the ones to reach for when you need high-performance lookups in an associative container and element ordering is not important. I'd also conclude that for very small containers, unsorted std::vectors are likely to be the way to go.

As for the title of this blog post, it looks like what you should use instead of using std::set (and std::map and their multi cousins) these days is probably a truly "unordered" container: a hash table or, for small containers, an unsorted std::vector.

Scott

UPDATE ON 17 SEPTEMBER

Jonathan Wakely posted source code showing how to replace my Windows-specific timing code with portable standards-conformant C++, and Tomasz Kamiński took my code incorporating it and sent a revised program that (1) uses the modify-a-volatile trick to prevent compilers from optimizing loops away and (2) checks the result of the lookup against the container's end iterator (because, in practice, you'd always need to do that). When I run Kamiński's code, I get non-zero lookup times for all containers for both GCC and VC++.

Here are the results I got for containers of up to 100 elements with GCC:
And here they are for VC++:
With both compilers, linear search through an unsorted std::vector is fastest for very small containers, but it doesn't take more than about 20-50 elements for the hash tables to take the lead.Whether that remains the case across hardware, compilers, standard library implementations, and types of data being looked up, I don't know. (That caveat is present in the original post above, but some commenters below seem to have overlooked it.)

Thursday, September 10, 2015

Interview with me on CppCast

I've been a loyal listener to CppCast since its launch earlier this year, so I was pleased to be asked to be a guest on the show. The result is now live. Among other things, hosts Rob Irving and Jason Turner asked me about my recent blog post on inconsistencies in C++ intialization syntax (their idea, not mine), my role as Consulting Editor for the Effective Software Development Series, common misconceptions C++ developers have about the workings of the language, Items from my books I consider especially noteworthy, advice to would-be authors and presenters, aspects of C++ I'd prefer didn't exist, and how I found myself lecturing about C++ from a nightclub stage normally used for belly dancing.

It was a fun interview for me, and I hope you enjoy listening to it.

Scott

Monday, September 7, 2015

Thoughts on the Vagaries of C++ Initialization

If I want to define a local int variable, there are four ways to do it:
int x1 = 0;
int x2(0);
int x3 = {0};
int x4{0};
Each syntactic form has an official name:
int x1 = 0;              // copy initialization
int x2(0);               // direct initialization
int x3 = {0};            // copy list initialization
int x4{0};               // direct list initialization
Don't be misled by the word "copy" in the official nomenclature. Copy forms might perform moves (for types more complicated than int), and in practice, implementations often elide both copy and move operations in initializations using the "copy" syntactic forms.

(If you engage in written communication with a language lawyer about these matters and said lawyer has its pedantic bit set, you'll be reprimanded for hyphen elision. I speak from experience. The official terms are "copy-initialization," "direct-initialization," "copy-list-initialization," and "direct-list-initialization." When dealing with language lawyers in pedantic mode, it's wise to don a hazmat suit or to switch to oral communication.)

But my interest here isn't terminology, it's language design.

Question #1: Is it good language design to have four ways to say the same thing?

Let's suppose that instead of wanting to define an int, we want to define a std::atomic<int>. std::atomics don't support copy initialization (the copy constructor is deleted), so that syntactic form becomes invalid. Copy list initialization continues to succeed, however, because for std::atomic, it's treated more or less like direct initialization, which remains acceptable. So:
std::atomic<int> x5 = 0;    // error!
std::atomic<int> x6(0);     // fine
std::atomic<int> x7 = {0};  // fine
std::atomic<int> x8{0};     // fine
(I frankly expected copy list initialization to be treated like copy initialization, but GCC and Clang thought otherwise, and 13.3.1.7 [over.match.list] in C++14 backs them up. Live and learn.)

Question #2: Is it good language design to have one of the four syntaxes for defining an int be invalid for defining a std::atomic<int>?

Now let's suppose we prefer to use auto for our variable instead of specifying the type explicitly. All four initialization syntaxes compile, but two yield std::initializer_list<int> variables instead of ints:
auto x9 = 0;                // x9's type is int
auto x10(0);                // x10's type is int
auto x11 = {0};             // x11's type is std::initializer_list<int>
auto x12{0};                // x12's type is std::initializer_list<int>
This would be the logical place for me to pose a third question, namely, whether these type deductions represent good language design. The question is moot; it's widely agreed that they don't. Since C++11's introduction of auto variables and "uniform" braced initialization syntax, it's been a common error for people to accidentally define a std::initializer_list when they meant to define, e.g., an int.

The Standardization Committee acknowledged the problem by adopting N3922 into draft C++17. N3922 specifies that an auto variable, when coupled with direct list initialization syntax and exactly one value inside the braces, no longer yields a std::initializer_list. Instead, it does what essentially every programmer originally expected it to do: define a variable with the type of the value inside the braces. However, N3922 leaves the auto type deduction rules unchanged when copy list initialization is used. Hence, under N3922:
auto x9 = 0;                // x9's type is int
auto x10(0);                // x10's type is int
auto x11 = {0};             // x11's type is std::initializer_list<int>
auto x12{0};                // x12's type is int
Several compilers have implemented N3922. In fact, it can be hard—maybe even impossible— to get such compilers to adhere to the C++14 standard, even if you want them to. GCC 5.1 follows the N3922 rule even when expressly in C++11 or C++14 modes, i.e., when compiled with -std=c++11 or -std=c++14. Visual C++ 2015 is similar: type deduction is performed in accord with N3922, even when /Za ("disable language extensions") is used.

 Question #3: Is it good language design for copy list initialization (i.e., braces plus "=") to be treated differently from direct list initialization (i.e., braces without "=") when deducing the type of auto variables?

Note that these questions are not about why C++ has the rules it has. They're about whether the rules represent good programming language design. If we were designing C++ from scratch, would we come up with the following?
int x1 = 0;                 // fine
int x2(0);                  // fine
int x3 = {0};               // fine
int x4{0};                  // fine
std::atomic<int> x5 = 0;    // error!
std::atomic<int> x6(0);     // fine
std::atomic<int> x7 = {0};  // fine
std::atomic<int> x8{0};     // fine
auto x9 = 0;                // x9's type is int
auto x10(0);                // x10's type is int
auto x11 = {0};             // x11's type is std::initializer_list<int>
auto x12{0};                // x12's type is int
Here's my view:
  • Question #1: Having four ways to say one thing constitutes bad design. I understand why C++ is the way it is (primarily backward-compatibility considerations with respect to C or C++98), but four ways to express one idea leads to confusion and, as we've seen, inconsistency.
  • Question #2: Removing copy initialization from the valid initialization syntaxes makes things worse, because it introduces a seemingly gratuitous inconsistency between ints and std::atomic<int>s.
  • Non-question #3: I thought the C++11 rule about deducing std::initializer_lists from braced initializers was crazy from the day I learned about it. The more times I got bitten by it in practice, the crazier I thought it was. I have a lot of bite marks.
  • Question #3: N3922 takes the craziness of C++11 and escalates it to insanity by eliminating only one of two syntaxes that nearly always flummox developers. It thus replaces one source of programmer confusion (auto + braces yields counterintuitive type deduction) with an even more confusing source (auto + braces sometimes yields counterintuitive type dedeuction). One of my earlier blog posts referred to N2640, where deducing a std::initializer_list for auto variables was deemed "desirable," but no explanation was offered as to why it's desirable. I think that much would be gained and little would be lost by abandoning the special treatment of braced initializers for auto variables. For example, doing that would reduce the number of sets of type deduction rules in C++ from five to four.
But maybe it's just me. What do you think about the vagaries of C++ initialization?

Scott