Thomas 的个人资料Thomas Kejsers Blog日志列表网络 工具 帮助

日志


2006/8/7

Factors to consider when performance tuning cubes and MDX

Before we move on the the wonders of subselects I would like to give you a few pointers on tuning MDX queries and cubes in general. During my time as a performance tuner on Analysis Services - these are the most important factors you should consider in a 2005 environment:
 
  • Use NON_EMPTY_BEHAVIOR whenver possible in your calculated members. This has a huge impact on your overall calculation performance. You should be aware of the semantic implications that NON_EMPTY_BEHAVIOR has on you calculation - so think carefully about applying this correctly. I should probably make a post on this later
  • Aggregation design. In most scenarioes in Analysis Services 2000 - this it not hard: Get you dimension counts right, avoid changing and parent/child dimensions, do as many aggregations as you feel comfortable storing and processing and extend as needed with usage based aggregation. In Analysis Services 2005 there are many more settings to consider and tune. I cannot stress enough how important it is to get yourattribute relationsships right. Use rigid relationships whenever possible and make sure you hierarchies are supported. Probably enough matter here for a seperate post
  • Watch our for CPU-hogs in your calculations. The MDX SUM function is the classical example - if you sum over too large a set you are essentially creating a "mega loop" in your calculations. I often hear people complain that Analysis Services is very bad at calculations. I think there are improvements to be made on the server side - even in 2005. But, most of the time, programmers just need to think harder about how they express their calculations before they complain about performance.
  • Minimize subcube operations. This is closely related to my previous post about multi-selects. I have some experiences from a big installation to share with you
2006/8/5

Doing Multi-Selects in Analysis Services 2000 and 2005

One of the major challenges for users of Analysis Services 2000 are queries that answer question of the form:
 
  • Return a hierarchized, drilled down, subset of dimension members
  • Limit the members of the above set to selected parts of the dimension

For example. Your interface has a product drop down list like this:

    • All Product
      • Accessories
        • Bike Racks
        • Bike Stands
        • Bottles and Cages
        • etc...
      • Bikes
      • Clothing
      • Components

You allow you user to select one or more subcategories at the same time. For the purposes of illustration, lets assume the users selects  "Bike Racks" and "Bike Stands". The user then asks for the order count of the hierarchized product dimension - but filtered by his selection.

Put differently, the users wants:

  • The Order Count for Bike Racks and Bike Stands
  • The Order Count of all parent members for the above two (ie: Accessories and All Product)

Naïvely you execute this MDX statement:

SELECT

[Measures].[Internet Order Count] ON 0,

HIERARCHIZE

(

GENERATE(

{[Product].[Product Categories].[Subcategory].&[26], [Product].[Product Categories].[Subcategory].&[27]}

,

ASCENDANTS([Product].[Product Categories])

)

)

ON 1

FROM

[Direct Sales]

(Try it out in Adventureworks)

What happens?... You get:

All Products  27,659
Accessories   18,208
Bike Racks       328
Bike Stands      249

The values for All Products and Accessories are not right! You want the value for accessories and All Products to be 328 + 249 (the sum of the selected descendants), not 18208 and 27659.

This is the essence of the visual totals problem: Analysis Services will happily scan the cube space for you and return the values of the parent. But you get the Order Count for the entire parent member - not the sum of the values you have selected. Why? Because this is the value that has been preaggregated in the cube.

There is something intuitively wrong with the MDX query above. You are essentially mixing up two different questións:

  • What is the shape of your desired return value? (The hierarchy of the product dimension)
  • Which subspace of the cube do you want the shape calculated over? (only calculate on Bike Rack and Bike Stands)

Aha!... You need to seperate shape and subspace - just like you seperate the columns and WHERE parts of an SQL statement. Applying our SQL knowledge to MDX (generally a bad idea) we try:

SELECT

[Measures].[Internet Order Count] ON 0,

DESCENDANTS

([Product].[Product Categories].[All Products], [Product].[Product Categories].[Subcategory], SELF_AND_BEFORE) ON 1

FROM

[Direct Sales]

WHERE

{[Product].[Product Categories].[Subcategory].&[26], [Product].[Product Categories].[Subcategory].&[27]}

This generates an error. Even if we reduced our WHERE critieria to a single select (simplifying the problem):

WHERE

([Product].[Product Categories].[Subcategory].&[26])

... we still get the infamous error:

The Product Categories hierarchy already appears in the Axis1 axis

This, deceptively simple, multi-select problem have caused many headaches for MDX programmers. There ARE workarounds in Analysis Services 2000 - but they take a syntactic form that is disgusting to behold and I will not quote them here.

Analysis Services 2005 to the rescue! Finally, we are able to seperate shape and subspace in our query. The above query can be written, correctly, like this in AS2005;

SELECT

[Measures].[Internet Order Count] ON 0,

DESCENDANTS

([Product].[Product Categories].[All Products], [Product].[Product Categories].[Subcategory], SELF_AND_BEFORE) ON 1

FROM

(

SELECT {[Product].[Product Categories].[Subcategory].&[26]

, [Product].[Product Categories].[Subcategory].&[27]} ON 0

FROM [Direct Sales])

Notice the user of the "subselect". The outer select specifies the shape of space to return - the inner select specifies the subspace that is the scope of the query. Syntactically elegant - and we get the desired result:

All Products  577
Accessories   577
Bike Racks    328
Bike Stands   249

Now, every coin has a back side. Allowing the user to specify any subspace in his query has som implications on how aggregations are being accessed. Programmers familiar with visual totals will probably have spotted this by now.

My next post will deal with the subspaces you can define and how they affect your queries.