Bazaar developers’ blog

May 17, 2011

bzr-2.4 faster large tree handling

Filed under: Uncategorized — jameinel @ 3:43 pm
Tags: , , , , ,

The last of my patches is queued up to land, so I figured I’d post an update about the performance improvements I’ve been working on. I’m also just excited about how well it has all come together.

There were essentially 3 changes that mattered for performance on large trees.

  1. Fixing iter_entries_by_dir() to preload the data in Repository- optimal ordering rather than by-request ordering. In large trees this was causing us to thrash and become pathologically slow. In the 70,000-file test tree, thrashing took about 3 minutes, the preloading version takes about 15s. This affected a lot of our commands, though I guess the next two fixes would actually reduce the number of commands affected by this.
  2. Fixing several code paths to use optimized iter_changes() rather than the generic iter_changes(). The generic path walks both inventories iter_entries_by_dir() and compares them. Our 2a format Repository can do iter_changes without loading the whole tree. (It internally uses a hash_trie to store the inventory, and so nodes with matching sub-trees can be skipped for comparison.) This generally shows up as something that was taking 15s (to load the whole inventory) dropping to <2s for the improved comparison. (bzr revert and bzr pull were both directly impacted here)
  3. Changing WT.set_parent_trees([one_tree]) to update itself using current_basis.iter_changes(one_tree), rather than setting the state from scratch. This basically adds another case where we can avoid reading the whole inventory state again, which is another 15s to <2s sort of change. This only showed up after fixing (2), because once the tree is loaded, the other actions are generally pretty quick. (bzr up, bzr pull)

This is the chart I put together for “whats-new-in-2.4.txt”. bzr-2.3.2 will have fix (1), but not (2) or (3), to give a feel for how much of an impact different fixes have had.

    bzr-2.3.1 bzr-2.3.2 bzr-2.4  action
    3m39s         1m08s   1m03s  bzr co --lightweight
      38s            8s      2s  bzr revert (in a clean tree)
    4m47s         3m56s     15s  bzr merge
    4m45s           20s      3s  bzr pull
    4m58s         3m00s      2s  bzr up
    9m33s           21s     19s  bzr uncommit (including a merge)
    4m44s           17s      2s  bzr uncommit (simple commit)

So yes, some operations that were taking almost 5 minutes have now dropped down to taking <3s.

You won’t see that dramatic of an improvement for smaller trees, though most cases will have a pleasant improvement. Here is a short list for the ‘Launchpad‘ tree (with ~8k items).

    bzr-2.3.1   bzr-2.4     action
    5.3s        5.2s        bzr co --lightweight
    0.9s        0.3s        bzr revert
    1.4s        0.4s        bzr pull
    3.9s        3.7s        bzr uncommit (with merge)
    0.9s        0.3s        bzr uncommit (without merge)

Anyway, I’m quite happy about how much better bzr-2.4 will be in large trees.

update:Add graphs…

About these ads

13 Comments

  1. This is awesome work; thanks for the effort being put into it. When can we pick up a beta with these changes from a PPA?

    Comment by Christian R Reis — May 17, 2011 @ 4:27 pm

  2. @kiko you should be able to get a snapshot tomorrow from ppa:bzr/daily, there will be a 2.4b3 beta next week, and then the final version of bzr 2.4 will be out in August, and going into Ubuntu Oneiric soon after.

    Comment by Martin Pool — May 17, 2011 @ 4:52 pm

  3. Note that the final changes are currently in PQM (3) from above. And since this is a sprint week, PQM is a bit backed up (currently showing 10 items in the Queue). So it is possible that this will miss tomorrow’s daily build. If so, then it should be available in 2 days.

    Comment by jameinel — May 17, 2011 @ 5:41 pm

  4. This is great work! What data set/source tree did you use for the first set of timings? How do these timings compare against git for the same tree?

    Comment by Elliot Murphy — May 17, 2011 @ 8:31 pm

    • I bet it was the linaro-gcc tree. But yes, it would be nice to include this in the post :)

      Comment by Michael Hudson-Doyle — May 18, 2011 @ 7:51 am

      • I think John’s “70k file tree” is a synthetic tree similar in size to gcc-linaro. It would be good to retry it there.

        Comment by Martin Pool — May 18, 2011 @ 8:10 am

  5. No, its the real gcc-linaro tree. I just wasn’t making public about a private branch.

    Comment by John — May 18, 2011 @ 8:11 am

    • I’m pretty sure gcc-linaro is public; it’s just some particular hardware support projects that are embargoed.

      Comment by Martin Pool — May 18, 2011 @ 10:25 am

      • Well, actually, the Linaro gcc branch is both completely public and guaranteed to contain no added chemicals or embargoed content. Linaro doesn’t actually do any vendor-specific GCC work; it’s only enablement work done within the landing teams, and that usually focuses on the kernel and hardware-dependent middleware.

        The gcc-linaro trees are available from https://code.launchpad.net/gcc-linaro

        Comment by Christian Reis — May 18, 2011 @ 7:55 pm

  6. [...] They have great improvements and features coming up, including finer-grained bugmail and dramatic speed improvements in bzr handling of large [...]

    Pingback by OpenStack @ Ubuntu Developer Summit « Seeing the fnords — May 18, 2011 @ 12:54 pm

  7. [...] They have great improvements and features coming up, including finer-grained bugmail and dramatic speed improvements in bzr handling of large [...]

    Pingback by SquareCows.com » OpenStack @ Ubuntu Developer Summit — May 18, 2011 @ 5:28 pm

  8. After another day dealing with handling edge-cases in the update-by-delta code, I’m happy to say it has officially landed as bzr.dev rev 5900.

    The current daily ppa shows: 2.4.0~bzr5894~ppa3932.3931~natty1 from 18 hours ago. So I have a strong hope that in 6 more hours we will build at least ~bzr5900 and you’ll be able to get your updated goodness sometime early tomorrow (depending on what Tomorrow is in your local timezone)

    Comment by John — May 19, 2011 @ 7:17 pm

  9. [...] and comes with a fairly straightforward GUI.  Unfortunately, it’s also the slowest (though 2.4 improves this by quite a bit), the least popular, and the most bug-prone (100% more internal errors than either [...]

    Pingback by » Teaching Version Control @ badocelot's gloomy coffin — June 24, 2011 @ 5:12 am


RSS feed for comments on this post.

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: