Demystifying the profiling process
Brad and the rest of the game team are busy hammering out the gold version of the game (which is going to be great, by the way - even I
like it, and I hardly ever play games nowadays), so I thought I'd take the chance to explain exactly how profiling works. If you've seen the raw notes
that Brad posted, you'll understand why they need explaining!
I think this is actually my first post here. I'm normally part of that 4/5ths of the company you don't see - Stardock's desktop enhancement applications. Like most people at Stardock, I usually work on several things in any given day. My main areas are Windows Vista (including TweakVista
developer support, and recently Stardock's Desktop Pet
Still, when it comes to the crunch, everyone has to dig in. The tricky bit is that, while I am
a programmer, that doesn't mean I can instantly understand hundreds of thousands of lines of code all at once. As a result, it's not really a good idea for me to be fixing bugs, as I'd probably put in more than I fixed, and right now that's something we definitely don't
So what can
I do? Well, basically I take the bits that require programming knowledge, but which can be done separately and which everyone else is too busy to do. The most obvious of these are profiling and leak testing - measuring what portions of the code take a long time, and testing to see whether any memory is being "leaked" (left lying around with no references to it) or otherwise used incorrectly. The profiling is the most interesting part, so that's what I'm going to tell you about.
|The classical way to do profiling is to run a piece of code and time it, then have it print out the times so you can compare them. You then look at the code that was run and try to figure out which lines are taking the time. That works for testing small sections of code, but it doesn't really help when you have one huge project that you need to test - it takes ages to add the profiling, and you get so much output that you don't know what to do with it. At this point you need tools specially designed for the job.|
I started off using AMD's CodeAnalyst, which is a free download from AMD. The way it works is that it takes snapshots of where the game is working every few milliseconds and then matches those up to the functions that were executing. That's great as far as it goes, and often very useful for smaller programs, but the trouble was it only shows time spent in a function, not how much time that function spent in other functions that it called. So, for example, I could see that we were spending a lot of time iterating through trees of data. However, I couldn't easily see what part of the AI was taking a lot of time, because it spent all its time calling other functions.
(For those of you who are looking at the picture and wondering "what the heck's a deque?" - it's a structure for the storage of data, like an array/vector for which it's fast to take stuff off/put stuff on the front and back - think a deque of cards. Here it's being used in graphics operations. The std:: things are standard library calls, which we use a lot of.)
What does it all mean?
As a result, I switched to Intel's VTune
(alas, not free), which does
provide this so-called "call graph" analysis (and which annoyingly required me to switch to an Intel box - my main work computers are both Athlon 64s). I'd not used it before, so It took me a few hours to figure out how things worked - and a few more to gather some test data by having the game run automatically by itself - but once I had, it became very easy to see where the hotspots were.
VTune shows what parts of the game are taking the most time to run.
Some of the problems were easy stuff. For example, it turned out that we were updating the mouse cursor texture every frame, which resulted in a small but noticeable delay all the time. There's a simple solution for this: Move the mouse cursor without updating the texture. Another was related to saving games. Save games aren't big, but it was taking a lot of time to get them ready, because the system relied on a memory buffer that started out very small and only grew very slowly. As any programmer will tell you, allocating memory is an expensive operation, and it doesn't get any faster when you do it thousands of times. CariElf upped the initial buffer size to 1Mb, and suddenly saving games was a snap.
Can you spot the problem?
It wasn't always this obvious, though. For example. Brad had told me that he thought the AI could be slowing things down . . . and it was, but it wasn't exactly due to the AI code itself. Instead, when the AI tried to design ships, it had to figure out exactly what improvements were available to it - a process which required matching the different components up against the available technologies. It turned out that this took a huge amount of time - something that nobody really expected!
Some things get really expensive when you do them thousands of times.
Another odd problem was figuring out where to put a newly-built constructor. You'd think that would be easy, right? Just plonk it onto the nearest starbase, if there was one around. But think about it - what if that base was full? Should it build another one, or go on to the next closest? A better solution would be to figure out what the most appropriate upgrade was (if any) out of the whole fleet of starbases, and so that was what it was doing - but to do that, it had to figure out what modules were available, taking into account the technology level and moral standaing of the race concerned, then check to see whether they already existed on the starbase. This sort of thing gets really slow when you have as wide a variety of upgrade modules as GalCiv II does!
This constructor was expensive to build, but it's even more expensive to figure out where it should be going.
One great part was when I found out that another quarter of the AI's time was being taken up figuring out which ships needed repair. This required figuring out exactly how many hit points a ship could have, which varied depending on whether the ship was in orbit over a planet with the Omega Defense System - which resulted in checking for its presence in the list of a colony's improvemements for every orbiting ship. This is another of those things that seems like it should be simple, but which required looking through a lot of data (the improvements list for the planet) a lot of times (once per orbiting ship, once per turn), causing a lot of calls - and, ultimately, using up a lot of CPU. We could use that time elsewhere.
The problem isn't always the code itself - sometimes it's just called too often.
The cases above are representative of most of the slowdowns I found. The best part is that there aren't
all that many of them, they don't affect things too much, and none of them are insoluble. A few have very simple fixes, while others will probably need some refactoring in order to render them harmless (such as a flag that is set when a planet gains the Omega Defense System, or a change in way data is stored). Either way, they will
be dealt with, and as a result you will be getting a better game that you can play without needing to upgrade your system.