Postcards from the Fridge

Tuesday, July 18, 2006

Startup Variability

I've figured out why the nautilus startup times are so variable.

I removed both the window manager AND gnome-panel from the startup sequence to do my testing. I figure if I can remove the variability when only launching nautilus, then I should be able to add back in the others and make sure that things don't break.

WHAT?

Even with gnome-session ONLY launching nautilus in the gnone-session file, nautilus still has highly variable startup times.

gnome-session starts a whole bunch of things internally, and then launches nautilus.

NOTE: I am currently running with ALL of the stracing turned on, so times may be slower than previous blog posts.

Example:



Here's the same thing sorted:


Look at how different the best and worst times are....

WHY?
gnome-session launches a whole bunch of things asynchronously. Nautilus starts a whole bunch of things asynchronously. These are all racing to finish, and some depend on each other.

Fastest run:




Slowest run:


Notice how a WHOLE bunch of processes startup between when nautilus first loads and the first icon is painted on the screen. (icon_container expose_event)

If we could prevent nautilus from blocking, or at least let it show something on the screen before all of these other threads start, things would be much more consistent.

NOTE: This totally sucks. Blogger is shrinking my pictures. My flicker account won't let me show the whole version unless I go "pro". I'll probably do that anyway.

Until I figure out what to do, simply look for the "red" in the picture. It indicates where new daemons were launched (exec'd). When they happen early in the run, it means that the nautilus startup is interrupted. When they happen late, it means that nautilus completed before they start.

What to ask?

The questions that I really want to answer are:
  1. Why does nautilus block during startup? How can we stop it from blocking?
    (It looks as if it is waiting for other pieces of gnome to startup)

  2. What daemons have to be pre-started for nautilus to begin instantly? (or another way of looking at this is: What are dependencies does nautilus have on other pieces of gnome?)
    It looks like gnome-volume-manager and gnome-vfs-daemon need to be started early.

  3. How can the dependencies be started before everything else?
How do we answer these questions?
  1. Read through the raw strace logs.
    This is pretty helpful, but it is alot of data. I have to study exactly what the nautilus threads were doing when they yielded the CPU. It would be really nice if we could record when a process/thread is switched off of the CPU.
  2. Extend Federico's visualization tool to show the interesting information
    I've currently extended it to show futex information, but it really generates an enormous image, and is very hard to see what is going on. However, I think it would be useful to somehow show the parent/child relationship for some of the processes. This would help determine the dependencies.. However, maybe this should be a different tool.
  3. The NPTL trace tool
    I just stumbled upon this, and I'm hopeful that this will give me some idea about how the various gnome threads interact.

Sunday, July 09, 2006

Cookbook

Reproducing

Alright, a few post back, people have been asking me how to reproduce the work that I've been doing.

First, I've be relying heavily on Federico's code to do the tracing/visualization. This requires that gnome is run within strace.

0) Boot your machine to init level 3.

This will allow us to run startx, run through the gnome initialization cycle, and then end without adding stuff to gdm.

1) Sprinkle the gnome code with calls to program_log.


First, I've created a header that I can include with the following code:
gnome-profile.h


#include <unistd.h>
#include <sys/time.h>
#include <stdlib.h>

static void program_log (const char *format, ...)
{
va_list args;
char *formatted, *str;
struct timeval current_time;
if (getenv("GNOME_PROFILING"))
{
va_start (args, format);
formatted = g_strdup_vprintf (format, args);
va_end (args);
gettimeofday(¤t_time,NULL);
str = g_strdup_printf ("MARK: %s: %s",g_get_prgname(), formatted);
g_free (formatted);
access (str, F_OK);
g_free (str);
}
}


Example:
In nautilus-main.c in the first non-declaration line of 'main()' I have added:
program_log("%s Starting_Nautilus",__FUNCTION__);
-and-
In nautilus-icon-container.c in the first non declaration line of 'expose_event()', I have added:
program_log("icon_container expose_event");

This is enough to time the nautilus startup.

NOTE: This implementation is pretty hacky right now. This should probably be pushed into a common library (glib?) somewhere, but right now it is handy because I don't have to rely on every application that I want to profile including a particular library. I just add the "#include" and drop in some calls to "program-log" and I am off to the races.

2) Add stracing to xinitrc.

When you launch gnome, you have to make sure strace is running, so I 've added calls to strace into my xinit file (for reasons described in previous posts) .

Add something similar to the following to your ".xinitrc":
"exec strace -e clone,execve,open,access -ttt -f -o /tmp/gnome.log /home/gnome/bin/jhbuild run gnome-session"
3) Start GNOME with profiling turned on.

Now you can launch gnome with tracing turned on:
env GNOME_PROFILING=1 startx
After this command has completed, an strace of the session will be sitting in tmp.

4) Add a command to the session startup which will automatically teardown the session.

I've added the following shell script to my gnome-session startup: (You may have to adjust the initial sleep if things take longer than 5 seconds to startup.)
tear_down.sh:
#!/bin/bash
sleep 5
gnome-session-save --kill
sleep 5
killall X
5) Turn off the "IS it ok to logout prompt"?
Run "gnome-session-properties" an disable 'ask on logout".

Now, you should be able to run startx, have gnome start, and then exit back to the initial prompt.

6) Script the automatic timing and analysis of the startup/teardown.


First, I created a script to automatically determine the time to start the session and time to start nautilus and output that in a file called "summary":
(The script is called logs.py)
#!/usr/bin/python
import sys
import string
found_event=0

for line in sys.stdin:

__if line.find("Starting_Nautilus")!=-1:
____start_time = string.atof(line.split()[1])

__if line.find('execve("/home/gnome/bin/jhbuild')!=-1:
____session_start_time = string.atof(line.split()[1])

__if (line.find("icon_container expose_event")!=-1) and found_event==0:
____expose_time = string.atof(line.split()[1])
____found_event=1

print "Start->Icon_Expose (ses):", expose_time-session_start_time, "(Naut):", expose_time-start_time
...
Next, I created a script to run this AND Federico's graphical analysis tool. It will create a directory with the time of the run, a copy of the log, and the picture of the execution.

I call it "test2.sh":
#!/bin/bash
while /bin/true
do
DATE=`date +%Y%m%d%H%M%S`
mkdir $DATE
env GNOME_PROFILING=1 startx
cp /tmp/gnome.log $DATE/
~/plot-timeline.py $DATE/gnome.log -o $DATE/output-$DATE.png
rm -f /tmp/*.log
cat $DATE/gnome.log | ./logs.py | tee $DATE/summary
sleep 3
done
7) Run for infinity.
Now, you can run this for a long time, and just let gnome startup/teardown. After 40 or so runs, I stop the loop, and see what happened.

8) Analyze the results

Next I gather all of the results into a single file with:
"grep Nau */summary > results.csv"
I can then load this file into gnumeric and graph the results.

9) (Extra) Prune the startup of the session.

Currently, I am just starting nautilus (no WM, and no gnome-panel).

I have a ~/.gnome2/session file that looks as follows:
[Default]
num_clients=3
1,id=default1
1,Priority=10
1,RestartCommand=nautilus --no-default-window --sm-client-id default1
...
That's all for right now. I found something interesting, but I'll save that for the next post.

ps. HI WIFEZILLA!