Archive

Archive for the ‘Performance’ Category

WMI and .NET Performance Hiccups : Win32 – the Savior

March 26th, 2009

… or read the title as – How could you speed up your software by 90% - There is always a way out there for tuning performance… this blogs is about one such instance where I dumped WMI (Windows Management Instrumentation) and turned back to Win32 just for performance gains.

Sometime back I was involved in a project which involved lots of hardware interfaces like interacting with huge SCSI Devices, Parallel Ports, Digital Imaging et al. Though I was a fan of Win32 I was thrilled with how WMI does wonders to reduce the development time drastically primarily because of its matured API’s. I was thinking about how difficult it would be for beginner-intermediate programmers to work on system level programming using C, C++ to interact with hardware, network devices, communication devices et al… and how error prone those codes are… WMI is definitely best in this case.

And many times WMI has always saved our development time. But there were moments where we happened to hit with lots of performance hits during our product testing. While probing out the reasons, it was really surprising to see the reasons of performance hiccups. Sometimes it was development team’s oversight, sometimes poor WMI was the culprit.

Here I’ve given 2 instances which gave us a considerable amount of performance boost for my product.

Handling System.Drawing.Image Performance using C#

The Product I was working on previously had to load & Edit JPEG Images, which were of Digital Quality, which means the Image size would be greater than 3-5 MB. My customer has been always complaining about image loading speed, everything was fine while loading a 1-2 MB Image, but things started to change drastically whenever a 3 MB or greater image is loaded, it took around 1-2 sec to load a image. 2 sec is fine for normal applications, but not for people who work with thousands of images per day, the Windows Form would almost hang before loading up the image. After a considerable amount of research I found out that the real culprit, which caused the bottleneck is the line “System.Drawing.Image.FromFile()”

I happened to hit a KB article which confirms this issue. And had a hotfix[!!], which updates
1. System.Windows.Forms.dll
2. System.Design.dll
3. System.Drawing.dll

Infact there was an interesting new signatures under System.Drawing.Imaging System.Drawing.Image.FromStream(Stream stream, bool useICM, bool validateImageData) This bool validateImageData was the real cause for the image being slowed down, which validated the content of the image file before loading up. So as size of the image increased, the loading time increased exponentially.

So I had to lookout for an alternate. The obvious choice was Win32. and here is the method equivalent to Image.FromFile()

public static Image Win32ImageFromFile(string filename)
{
    filename = Path.GetFullPath(filename);
    IntPtr loadingImage = IntPtr.Zero;
 
    if (GdipLoadImageFromFile(filename, out loadingImage) != 0)
    {
        throw new Exception(”Oops! GDI Exception.”);
    }
    return (Bitmap)imageType.InvokeMember(”FromGDIplus”,BindingFlags.NonPublic | BindingFlags.Static| BindingFlags.InvokeMethod,
    null
, null, new object[] { loadingImage });
}
 

And now, when I used this new method… voila! the images started loading up atleast 90% Faster and took less than 10Millisecond to load! Wow! That was really great and amazing.Handling Performance issues in Win32_LogicalDisk using C#

Here is another instance, where I dumped WMI and used Win32 Instead.
Here is the simple WMI code which would list the removable drives of the computer.

# region “WMI Code to retrieve Drives”

ManagementClass driveClass = new ManagementClass(”Win32_LogicalDisk”);
ManagementObjectCollection drives = driveClass.GetInstances();
StringCollection driveCollection = new StringCollection();
try
{
   
foreach (ManagementObject drv in drives)
{
//Check is made to find whether the drive is from removable storage device
if ((drv[”Description”].ToString()==”Removable Disk”) && (drv[”DriveType”].ToString()==”2″))
{
    driveCollection.Add(drv[”Caption”].ToString());
}
} }
# endregion 
This code would take a minimum of 4-5 seconds to enumerate my disk drives. And another problem is that, every time the floppy drive is also physically checked[but… why!!], which further slows down the execution time. None of our clients would accept this, when this feature is used frequently. There was no way to solve this issue except to lookout for a Win32 Method, and here is the alternative… Using Win32

# region “WIN32 Code to retrieve Drives”
[System.Runtime.InteropServices.DllImport(”kernel32.dll”, SetLastError=true)]static extern uint GetDriveType(string lpRootPathName);
/* Retrieves All the Mounted Drives on the computer. */
string[] _drives = System.Environment.GetLogicalDrives();

foreach(string _drive in _drives)
{
    /* Call Win32 GetDriveType to determine the Drive Type,based on the Drive Letter */

//Check whether the passed Drive is a Removable Disk Type     
_driveTypeLength
= GetDriveType(_drive);    
if
(_driveTypeLength == 2 || _driveTypeLength == 5)
    {
         driveCollection.Add(_drive);
    }
}
# endregion

This code executed in less than 100 MilliSeconds !!! That was an incredible performance boost.
 Do you think using Win32 as an alternative is insane? Have you faced such realtime problems? Would you still use WMI? Talk back!

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Logu Krishnan C#, Performance , , , ,

ETL Design Pattern : E-LT-L

March 6th, 2009

I was ramping up C# to create BI Framework and I hit upon the term “Design Patterns”. I decided to go through few patterns only, as I did not have patience to complete the book, also I assumed I don’t need it. Finally at the end of day, I was able to implement a couple of patterns in my BI Framework application. During this time, I wondered whether there are any pre-defined design patterns in Data Warehouses, ETL, Cubes, and Universes. My quest begins … In this blog, I will focus on ETL alone. 

I started looking back at my projects to analyze if I had used any design patterns or whether I hit upon the recurring problems that could have been solved with alternate designs. What I found was quite interesting. I have been using best practices defined for each component all along. Then why were we slogging all the time. What were the problems we were facing every day? Now the right time! I decided to plunge deep in each component to find out exactly what I could have done differently during design which could have made my life easier. And that is where I came up with an ETL design pattern E-LT-L.  

Before I explain what this E-LT-L means, let’s look at ETL first. ETL stands for Extraction, Transformation and Loading. In ETL we extract data from multiple data sources and transform the incoming data in a format compatible with data warehouse structure, followed by loading into data marts. Usually companies use some commercial tools like SSIS, BO Data Integrator Designer, etc… The developers then make full use of these tools and they end up using most of the functionality provided in these tools. And to an extent, this looks right too, as the companies have made an investment for this purpose.  

Finally when companies, evaluate their ROI, the results are amazing. By following industry standards and using commercial ETL tools, training their development teams, the results do not look good. A new set of problems like performance problems, steep learning curves, fixing the parts that are not broken, buying additional hardware, etc. have come up.  From ETL perspective, even though, most companies have a designated job server, they do not get a good performance. After a while in production, when there data marts grow in size or the size of incoming data increases, the performance of ETL job takes a hit. To resolve these issues, many things are done like increasing configuration of job server, make changes in database structure, use bulk loading options (favorite choice for techies), split the jobs, pushing few things like summarization over  to weekends, etc.. This results in making the system more complex than it should be which has a direct hit on overall IT cost.    

So exactly what went wrong?  

This is where a new pattern E-LT-L comes into existence. Most of the recurring problems can be resolved using this design pattern.  

E-LT-L stands for Extraction, Loading and Transformation, and Loading. This basically suggests that, once the data is extracted,  instead of applying transformations (T) in the staging area, load this data into data mart; and then apply transformations there (LT).  Since, incoming (raw) data is in data mart already, it would make more sense to use database objects (stored procedures) for transformations instead of row-based transformations available in ETL tools. Using database based transformations would resolve most of the problems. 

Effectively, this pattern calls for BULK loading and transformations at correct place without moving huge amount of data around.  

This may sound strange and many people would agree to debate. To prove the point, I would present few scenarios and let you decide what is good for your implementation.  

Scenario 1: Let’s assume you have 1 million incoming records that need to do look up for say customers. No matter what tool you use and how much you configure it, it has to run some sql against customer master, which is typically huge in may DW installations, to get the customer code. This will become a bottleneck as the sql would be executed multiple times. Also to add to the woes, the “customer code” column value has to travel from database server to job server and is stored in the placeholder (variable) in incoming row. Instead if you code the lookup transformations, in stored procedure, with one sql you can update all the rows by simply joining the staging table with customer master table. By all counts, the performance of this sql cannot be beaten.  

Scenario 2: We have incoming product master for a large retail chain. We need to implement slowly changing dimension type 2 here (insert new records and update existing records if already present. Any ETL tool would implement this transformation row by row and it can get painfully slow. This transform can be easily developed in stored procedure using MERGE sql statement. For more experienced developers this can be made meta data driven.  

To summarize, the transformations done using stored procedures help as follows:

  • It does not move data between database servers and job servers. We get a good boost in performance.
  • The data is processed in bulk.
  • Database engine (Oracle,SQL Server) is designed to do this job in the most efficient manner than any third party ETL tool can do.
  • It is always easy to fix/debug the stored procedure than some transform in ETL tool.
  • Small learning curve for developers to ramp up on the ETL tools.
  • Deployment of ETL packages (which is surely a problem in ETL tools). It takes a lot of effort.

I am sure, many of you would be thinking then why we shall use any commercial tool for ETL at all. These tools have their own significance. We would need a commercial ETL tool to achieve other goals of ETL, such as:

  • Defining a workflow where we decide upon what tasks shall execute and in what order. What to do if some task fails.
  • Executing tasks in parallel. 
  • Even though a stored procedure is created to transform some data, it needs to be called from an appropriate place in ETL.
  • Use the logging/notifications capabilities of ETL tool, which usually are very efficient and simple. 
  • Use the scheduling and other features readily available in commercial ETL tools 

What this E-LT-L pattern suggests is the design of ETL architecture. By having good ETL tool and good design of ETL, we can write efficient and manageable jobs. 

If you have any questions or comments, you can reach me at jagdishm@aditi.com.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Jagdish Malani Architecture & Design, BI - Business Intelligence, Performance ,

Back to Basics: Performance Killer Code – Unaligned Memory in 32-Bit for C# Struct

March 5th, 2009

Recently I was analyzing a .NET Application for performance which had lots of structs defined in it, and happened to hit a strange reality. Unaligned Memory problem! I was running a profiler, and found that the memory allocated for few structs are huge than it should normally allocate (based on my own math). When I probed further, there was an interesting discovery. Read on…

Let’s get back to basics…
Alright here is a little head spinner… What is the difference between the following structures?


struct BadStructure
{
char c1;
int i;
char c2;
}struct GoodStructure
{
int i;
char c1;
char c2;
}

Nothing much, except the jumbled type declarations… Huh?

Fine, Now let’s look at the size of these structures,

The size of BadStructure Structure in:
.NET Framework 3.5 : Managed sizeof= 12 Bytes, Marshal.Sizeof = 12 Bytes
The size of GoodStructure Structure in:
.NET Framework 3.5 : Managed sizeof= 8 Bytes, Marshal.Sizeof = 8 Bytes
[Note: Size of int=4, char=2]
The Reason behind these differences is “BYTE ALIGNMENT”, As with the default packing in unmanaged C++, integers are laid out on four-byte boundaries, so while the first
character uses two bytes (a char in managed code is a Unicode character, thus occupying two bytes), the integer moves up to the next 4-byte boundary, and the second character uses the subsequent 2 bytes. The resulting structure is 12 bytes when measured with Marshal.SizeOf.32 bit microprocessors typically organize memory as shown below.
                  Byte0  Byte1  Byte2 Byte3
0×1000
0×1004     A0        A1        A2      A3
0×1008
0×100C                 B0         B1      B2
0×1010     B3

Most of the processer architectures cannot read data from odd addresses.
Processor Architectures are inefficient in reading the data if it starts at an address not divisible by four.

Memory is accessed by performing 32 bit bus cycles. 32 bit bus cycles can however be performed at addresses that are divisible by 4. So for efficiency purposes, compilers add the so-called pad bytes. The reasons for not permitting misaligned long word reads and writes are not difficult to see. For example, an aligned long word A would be written as A0, A1, A2 and A3.

Thus the microprocessor can read the complete long word in a single bus cycle. If the same microprocessor now attempts to access a long word at address 0×100D, it will have to read bytes B0, B1, B2 and B3. Notice that this read cannot be performed in a single 32 bit bus cycle. The microprocessor will have to issue two different reads at address 0×100C and 0×1010 to read the complete long word. Thus it takes twice the time to read a misaligned long word.

The following byte padding rules will generally work with most 32 bit processor.

a. single byte numbers can be aligned at any address
b. Two byte numbers should be aligned to a two byte boundary
c. Four byte numbers should be aligned to a four byte boundary

This is the cause of the difference.

Fine…. How do we fix this ?

The .NET compilers apply a StructLayoutAttribute to structures, specifying a Sequential layout. This means that the fields are laid out in the type according to their order in the source file.

Here is the IL for Bad Structure.

.class nested private sequential ansi sealed beforefieldinit BadStructure extends [mscorlib]System.ValueType
{
.field public char c1
.field public char c2
.field public int32 i
}
In the .NET Framework 3.5, the JIT does enforce a Sequential layout (if specified) for the managed layout of value types,We can use the System.Runtime.InteropServices namespace and the StructLayoutAttribute class to control the physical layout of the data fields in the Microsoft .NET Framework 3.5. So Fix is to specify [StructLayout(LayoutKind.Sequential, Pack = 1)] for the struct.Watchout for structures when you create them next time, and think about playing around with ‘m’ structures with ‘n’ size…. m x n = !!! You can definitely save few Kilo Bytes of load or worst case if you are using structs heavily for Data transformation you might even save few Mega bytes. Alright, Time to re-factor your code now :)

Happy Coding!

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Logu Krishnan C#, Performance , , ,