SaaS Application Database design approaches

June 3rd, 2009

One of the concern areas for any clients (tenants) in using any SaaS model application is “Data privacy”. “Data Sharing” option would really help the tenants, provided the application database design approach is fool-proof and build the confidence amongst the tenants.

SaaS (Software as a Service) is a model of software development whereby a provider licenses an application to customers (subscribers or tenants) for use as a service on demand. SaaS based applications are multi-tenancy not multi-instance application architecture.

“Data Storage”, “Data Synchronization” and “Database Maintenance” are three key focus areas from the deployment perspective. I came up with the following 3 database design approaches and optimized them to make the best out of each.

1.  Individual databases servers:
The best way to isolate the tenant’s data is to deploy the tenant data in separate database server. This approach is it is easy to extend the existing database to meet tenant’s customization requirements. Flip side of this approach is its high cost of hardware inventory, licensing cost per hardware, and maintenance. Few tenants’ who requires a high degree of data isolation (physical) and are willing to pay more can opt for this design approach.

2.  Shared database servers with Individual Schemas:
Tenant specific schemas would be created in the same database server and access rights permissions are enabled in such a way that a tenant can access only the resources (tables, stored procedures etc) in their specific schema. The key advantage of this approach is lower cost and support a virtually unlimited number of tenant’s databases. This approach will improve the performance of the application as we can work around a solution to effectively utilize the connection pooling. Even though the tenant’s databases are in different schemas in the same server, we can use the following technique to maximize the performance in DB connection. The following code is in C# (Microsoft .Net platform) and I am sure we can do the same with other languages as well.

1) objConnection.ConnectionString ="Data Source=MSSQL1;
InitialCatalog=FunnelDatabase; Integrated Security=true;";
2) objConnection.Open();
3) objConnection.ChangeDatabase(”Tenant1″);
4) objCommnad.execute ("storedprocedurename");

The point # 3 in the above code is important. An initial connection to the database server has been established to FunnelDatabase. However, before executing the stored procedure / SQL statement, line #3 is changing the target database to Tenant1. By doing so, we are still utilizing the initial connection string and hence using the same connection pool for all the target databases.

3. Same Database with Same Schema:
All tenants’ data resides in same schema and share the same set of tables. But, a tenant identifier can be added to the primary key (composite key). A key challenge in this approach is it is not customizable at tenant level and rigorous testing required to gain the tenant’s confidence for data security and data privacy. Barring the above concerns, this is the best among all three approaches in terms of cost and maintenance. Even in this approach, we can establish some degree of data isolation by using some of the advanced techniques such as “Partitioning methods“, which allow physical data separation of each tenant’s data across physical devices while providing simplification of maintenance due to shared table definitions.

Let us talk about deployment strategies with networking options in our next blog.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Sreenivasa Rao Pilla Life at Aditi

IM on Group Chat client

April 15th, 2009

The Office Communicator client offers much more than just IM – it is a rich application that offers conferencing, voice, and desk sharing among other features. For business users in the Financial Services space, these features can be very useful. However, chat (IM and topic-based) is the most important medium of collaboration for these user communities. Given that Group Chat is available on a client that’s different from Communicator, organizations will have to roll out two separate clients to users to offer the whole gamut of communication features.

When talking to MindAlign customers, a question we hear quite often is whether users have a choice about how they use these clients. When a user has both clients installed, by default, IM is not available on the Group Chat client. So a valid concern is the impact to productivity as users switch between the two clients for IM and Group Chat. One option for organizations is to roll out only the Group Chat client. IM is then available on this client, and users can maintain their contact/buddy list at one place. The fact is that users are likely to be using IM and Group Chat a majority of the time, so this might be a safe choice.

But users would definitely like access to conferencing, voice and other communication features – these are what have the potential to increase business productivity and provide return on investment from the new platform. So does this mean that business users will be stuck switching between two chat clients on a day-to-day basis? Fortunately, no. Microsoft allows users to decide on the end point for IM with a simple registry setting.

Here are the steps to enable IM in Group Chat when you have both clients installed (when you only have Group Chat, you don’t have anything to worry about – you can use IM within Group Chat by default):

1. Add an entry, DisableIM under KEY_CURRENT_USER\Software\Policies\Microsoft\GroupChatConsole\Permissions. This should be of type DWORD -  make sure the value is set to 0

2. Restart Group Chat – on logging in, you should see your buddy list and be able to initiate a private conversation.

3. If you are logged into Group Chat only, and someone sends you an IM, a new Group Chat window is launched. If you are logged into both clients, then both clients are notified. The client you pick first will handle the conversation. A different IM conversation can be initiated from either client.

4.  If you’d like to restrict IM to Group Chat alone, then you can disable IM on Communicator. You can use the registry entry DisableIM type DWORD = 1 under HKEY_CURRENT_USER\Software\Policies\Microsoft\Communicator - You can continue to use Voice, Conferencing and other features on Communicator.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Sweta Jagirdar Unified Communications , ,

What is Unified Messaging?

April 9th, 2009

Unified Messaging is an Exchange Server role that was introduced in Exchange Server 2007. It enables you to access your message types (e-mail, voicemail, fax, SMS text) from your Outlook email client. The basic idea behind Unified Messaging is that users communicate in a variety of different ways. Some users prefer to send E-mail messages, while others prefer using the telephone and some users might need to have live discussion. Now, if you bundle these choices into one single entity you get - Exchange server 2007. And if you add flavour of instant messaging you have - Microsoft OCS/GCC  :)

Now you will ponder, what is the term Unified doing here? - Well, the term Unified refers for the basic message layer being common for different message type. This means, data gets captured in single formative body that can be represented in different formats as per the chioce of end user with subtle UI variations.

The term Unified Messaging(UM) is sometimes confused with Unified Communications(UC). UM systems culls messages from several sources (such as email, voice mail and faxes), but holds those messages for retrieval at a later time. And the way or underlining protocol that does this job of delivery forms your UC.

In a company though, a user typically has two separate mailboxes; one for E-mail messages, and another one for voice mail messages (google/yahoo etc ..). Furthermore, voice mail has traditionally been tied to the telephone. Although it is common for voice mail to be remotely accessible, users often find themselves writing down names, numbers, or messages on pieces of paper, which often get lost! (at-least it happened with me :))

Microsoft designed Exchange 2007 so that the Inbox allows users to store E-mail messages, voice mail messages, and faxes all in the same place. This frees the user from having to look for messages in multiple locations. It also gives users a way to make voice messages search-able; just in similar fashion we search our mails in Outlook. Now with OCS 2007 R2 you can check offline messages and post emails as chat to your internal chat environment.

So in brief Unified Messaging can provide you:
- Voice mails/Phone calls/Missed calls/call waiting/Forwards/Call forking
- IM chats/Group messages with Emotions :), :( etc..
- Create own filters/triggers
- Contact grouping
- Remote call controls and remote desktop sharing
- Web conferencing

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Vishvesh Ram Raiter Unified Communications , ,

ETL: How to handle bad data

April 7th, 2009

During any ETL design, we implement various functionalities like validation, auditing, notification, job recovery, job logging, data cleansing, handling bad data, etc. I am going to talk about handling bad data in this blog. At a top level, ETL design allows for bad data to be rejected and sent over to the appropriate users in the form of files. But, in my opinion, there is more to this than meets the eye. As an ETL architect, our responsibility does not end there. So before I get into details of how shall we handle bad data, let me tell you how what makes the incoming data bad enough to be handled. Following are the few reasons that generate bad data:

  1. A set of business rules are laid out that define whether the incoming data is good or bad. Let’s consider the sales record where the cost of the product must be present. If the cost contains a null value or a negative number, the sales records would be considered bad.
  2. Any data that would not satisfy the referential integrity in data warehouse database. This usually happen in case of missing inter-dependent data. If the incoming data contains references to some other data, which could not be loaded for some reason, this incoming data becomes bad and shall not be loaded into data warehouse database. A typical example would be a retail chain maintaining product master in a centralized database. And the sales data is generated across different POS terminals. So during ETL, if corporate database is down for whichever reason, ETL would not be able to load new products. But at the same time, there is sales generated for these new products. Thus with no product in product dimension (master) the sales record is considered bad at this moment and would not be loaded into data warehouse database.
  3. Missing business keys: If mandatory data is missing in the incoming data then that data is considered bad. This usually does not happen if the data is sourced from other relational databases. While sourcing data from files, there is every possibility that data may go missing, even if the format of the incoming feed file is already decided.
  4. Missing data: There may be many cases when the data is missing in the incoming data. That logically makes the data bad. For example an incoming product feed file contains record with no product code and product description. In this case the data is considered invalid.

Now, as we have seen what generates the bad data, we must understand that data cleansing does not make any data bad. Data cleansing is applied only on good data. So before we start data cleansing, a mechanism needs to be put in place to identify good data from the bad data.

Once this bad data is identified, it is usually stored in a separate area called “Rejection Area”. This rejection area can be in a separate schema in the same database that contains staging schema or it can be a separated database altogether. The structure of the rejection area (table structure) is similar to that of staging area with the additional few columns. These additional are required to store metadata about rejected data.

As an ETL architect we need to design our ETL to provide following functionality:

  1. Ability to reprocess this bad data whenever required. The data that could not be loaded due to missing references is usually re-processed when that missing data is loaded into the data warehouse database. Consider the case where due to missing product master, sales data was rejected. Later, when the latest product master is loaded, this bad data (which wasn’t really bad) which could not be loaded needs to be reprocessed. Otherwise the sales summary report would not be accurate. Another factor when the bad data needs to be reprocessed is change in business rules. If there is a lot of rejection due to strict validation rules, customer may decide to change these validation rules so that large amount of data is not rejected.

    We can automate this functionality by adding few columns in the rejection area table:

    1. Reprocess_flag (Y|N): This flag when set, determines if this record needs to be re-processed. Moving these rejected records from rejection area to staging area should be automated. This would help customers in various ways such as reduced the dependency on IT staff, lower maintenance cost.
    2. Reprocess_Job_Id: Usually the metadata about each run of job is maintained. So to be able to audit, when the rejected record was reprocessed, the job id is stamped into this column.
    3. Active_Flag: Once the record marked to be reprocessed, is copied over to staging area, this record is made inactive as this record is not valid anymore. It may happen that this record may fail validations again and end up in rejection area, but it would be considered as another record. So in essence, there is only one instance of the rejected record in rejected area. This implies that inactive records cannot be chosen to be re-processed.
  2. Ability to reprocess incoming data: Many times due to various reasons, the same data that is already processed is fed again into the ETL. This requires us to identify the corresponding records in rejection area and mark them as inactive as these records are not valid anymore. The incoming records would need to be validated as per the current business rules. The records in staging area are compared with the records in rejection area against the business keys and for matching records active_flag is set to ‘N’. This process of marking existing rejected records as inactive is usually automated.
  3. Ability to mark invalid data: Sometimes the business keys in the incoming records are null. These records are then eventually end up in the rejection area and are active. These records, no matter how many times are reprocessed, they would end up in rejection area again. At the same time, the incoming records can be matched with these records. Thus these records should be marked as invalid. For this add following column in the rejection area table:
    1. Valid_flag (Y|N): This flag must be set to ‘Y’ for missing business keys. This implies that this record cannot be re-processed ever.

    The important fact to be noticed here is that the responsibility of ETL architect does not end here. By designing the ETL, to mark the rejected data as invalid does not solve any business problems. The incoming data must get loaded into the data warehouse database. So it becomes very important for BI architect to talk to end users and tell them the impact of this. The end users may need some tweaking in the source systems, but if they need accurate reports they must send the accurate data.

What I have explained is just one way of designing rejection area (tables containing bad data). What I have discussed is the concept of handling bad data. There can be different ways of implementing the above mentioned functionality. Once ETL and database is designed appropriately, an interface must be provided to the end users that allow users to do following:

  1. Select any rejected table and mark rejected records to be reprocessed.
  2. Select any already processed data for reprocessing. This is simple if the incoming data is in the form of feed files. This gets little bit complex when the data is extracted from existing databases. Typically in large ETL systems, the staging area is archived. Depending on the needs of the customer, this goal can be achieved.
  3. Look at the invalid data and analyze it to be able to fix the source systems accordingly.
  4. Execute the ETL job after selecting the rejected records or selected source data for reprocessing. This would depend on various other factors such as ETL time window, the need to reflect the correct data, the time of ETL run, etc.

Last but not the least, as an ETL architect our goal shall not be just to implement some logic to handle bad data. Our main responsibility is to make this whole thing as automated as possible. Automation would provide various benefits such as reduced development time, lesser errors hence increased quality and finally the reduced cost for the customer.

If you have any questions or comments, you can reach me at jagdishm@aditi.com.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Jagdish Malani Architecture & Design, BI - Business Intelligence , , , , ,

Developing Managed Event Sinks/Hooks for Exchange Server Store using C#

March 28th, 2009

One of my previous projects involved me to create a Managed Event Sink for Microsoft Exchange Server Store. Being the first attempt on the topic it took a while to grasp and crack the event sinks – surprisingly googling did not help either, but finally when I cracked, I thought I shall share it to the world for common goodness :-)

Hence, this Article…

So, what does an Event Sink Really mean?

Event Sink is a piece of code that gets triggered on predetermined events. A more classic jargon I can give as an example is “Hooks”, i.e. we hook to an event and when the event occurs our custom code executes first and later the control is passed back to the original event if required. Similarly, we could hook to a mailbox of anyone on the exchange server and could execute the hacked hook even before the exchange server events are fired. This gives us to build a series of LOB applications.

Exchange Store Events

Some of the events that can be hooked to Exchange Server are

1. Synchronous Events – Events that get triggered before an item [Mail, appointments, documents, tasks etc] is committed to the exchange server. These events pauses the exchange store thread until the event sink finishes executing. No other process can access the item during this event sink execution period as, event sink has the exclusive control over the items. Following are the events that are classified as Synchronous events.

a. OnSyncSave – fires when the item is saved to exchange, but before the changes are committed.
b. OnSyncDelete – fires when the item is deleted from exchange, but before the delete operation is committed.

2. Asynchronous Events – Events that get fired after an item is committed to the exchange server. These Async events do not block the exchange store thread. Following are the Asynchronous events.

a. OnSave – Fires after the item is saved to exchange and changes are committed
b. OnDelete – Fires after the item is deleted from the exchange and changes are committed.

3. System Events– Events that get fired based on some system wide actions on exchange server, the following are the system events.

a. OnMDBStartUp – This fires up when the Exchange Database is started.
b. OnMDBShutdown – This fires up when the Exchange Database is shut down.
c. OnTimer – Executes a piece of code at predefined intervals. This is a very useful event, which runs irrespective of specific events.

Synchronous and Asynchronous events are tied to a specific item or folder in the exchange store.

All these events are exposed in the Exchange CDOEX library [cdoex.dll] as interfaces. Fig 1.1 shows the object browser window of the CDOEX library.

So What? What Can I Build?

Some of the applications that can be developed using Event Sink are,

  1. Notification Subsystems
  2. Global Timer applications
  3. Workflow based applications
  4. Automatic Categorization subsystems
  5. Store maintenance for administrators

Let’s Code Now…

Fire up your Visual Studio.NET and choose new C# Class library project and name the project, hmm… let’s call it as “MyEventSink”.

On the Solution explorer, right click the project name and choose Properties, on the Project Properties page choose configuration properties choose build and set Register for COM Interop to
True.

Now, Copy the below files to the MyEventSink bin directory

  • exoledb.dll from exchange server bin directory (\program files\exchsrvr\bin)
  • cdoex.dll - \program files\common files\Microsoft Shared\CDO
  • msado15.dll - \Program Files\Common Files\System\ADO

Open up the VS.NET Command Prompt and navigate to MyEventSink bin folder, and create strong name keys for the above libraries. Key-in the following commands

> Sn –k exoledb.key
> Sn –k cdoex.key
> Sn –k msado.key

We need to create interop assemblies of the above library, in order to, create the interop assemblies we shall use the tlbimp tool. Key-in the following commands to create 3 interop assemblies.

tlbimp exoledb.dll /keyfile:exoledb.key /out:interop.exoledb.dll /namespace:CDO
tlbimp cdoex.dll /keyfile:cdoex.key /out:interop.cdoex.dll /namespace:CDO
tlbimp msado15.dll /keyfile:msado.key /out:interop.adodb.dll /namespace:ADODB

Copy these interop dll files to the debug folder. Switch back to VS.NET and add references to the above created interop DLL files. Modify the following attributes on the AssemblyInfo.cs

Under General Information section, modify

[assembly: AssemblyTitle(”MyEventSink”)]

[assembly: AssemblyDescription(”My Event Sink - Logu”)]

at version information section, create a new GUID and add

[assembly: Guid(”44E6847A-0012-42af-A317-1E1A9F0C853D”)]

[Tip: You can create a new GUID by clicking Tools->Create GUID]

at sign information section, modify

[assembly: AssemblyDelaySign(false)]

[assembly: AssemblyKeyFile(”MyEventSink.key”)]

[assembly: AssemblyKeyName(”MyEventSink”)]

Now, Choose Project Properties and set the “Wrapper assembly key file” to MyEventSink.key and “Wrapper assembly Key Name” to “My Event Sink”

Start the VS.NET Command Prompt and change directory to your project directory, and create a key, key-in the following,

> sn –k MyEventSink.key

Switch back to VS.NET IDE, and change the file name of class1.cs to a new name like “ExchEventSink.cs“, double click the .cs file to open.

Add,

Modify the class definition code to resemble like below,

If you observe the above code, you can notice that we are implementing the IExStoreAsyncEvents interface, which implements the asynchronous events methods namely onsave and ondelete. We shall implement the same now, add the following to your code [check the attached zip file for more information]

In the above code, we are processing an exchange item on onsave method, and we create a LOG file. This is a simple code example; modify it to your requirements.

Compile the class, you have your event sink component ready. Now, Open Component Services, under COM+ applications create new empty application and name it as “MyEventSink”, then, expand, components under MyEventSink and click “import components that are already registered”

And choose “MyEventSink.ExchEventSink” from the populated list.

Now, the event sink component is registered to the server.

We are done on our development part. Now, you can bind the component to any folder of exchange store, there are multiple ways to do this, I prefer the following,

RegEvent.vbs - I’ve attached the VBS file along with the download zip, this script creates the event registration for the specified folder. The following command binds the event sink to my inbox folder,

I’ve included the vbs file along with the zip file.

Exchange Explorer – this is a tool you get with Exchange SDK

Alternatively, you can build your own event registration [that’s a separate article by itself :-) ]

At last, we are done… We have created our own Managed Exchange Store Event Sink. You can also implement the Synchronous Events and the System Events as same as we have implemented the Asynchronous events.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Logu Krishnan C#, Exchange Server, Microsoft , , ,

Future of ETL - Metadata driven

March 26th, 2009

Some time back, I was called to design ETL for a mid-sized enterprise. I had to deal with various issues like multiple data sources, not-so-clean data, various data validations, data cleansing needs, ETL time window within which the ETL job shall finish, changing requirements, and the list goes on. On the top of that I had very tight timelines. Then with the experience in my previous BI projects, wherein I dealt with ETL with different tools (BO, SSIS), I came up with a list of goals, which I thought shall be there in any standard ETL implementation and must be implemented here too. Here are some of the main goals:

  • Performance
  • Flexible enough to handle changing requirements
  • Least maintenance
  • Automation
  • Modularizing processing
  • Data quality
  • Recoverability of ETL job
  • Auditing
  • Logging
  • Notification
  • Ability to reprocess rejected records, files, or any other data source
  • Remote administration of ETL job

While going though this list, I realized that all these features were there in my earlier ETL implementations too. This fact led me to a list of questions like “How I am going to improvise on my earlier designs”, “How can we take ETL to a new level”. Before I set out to find answers to these questions, I asked myself a simple question “Why”. Why shall I improvise my ETL design? What is the need of this at the first place? I found the answers early. Surprisingly the answers were “Reduced development time, with increase quality results in reduced cost for customers”. This can be a good selling point for sales team too, that I would cover in my next blog.

So coming back to original questions as how to improvise ETL design, I started with comparing different ETL tools that I have worked with in the past, looked at how these tools work, how these tools handle metadata, how they eventually run the job. Interestingly, I found that these tools are good enough to meet standard ETL requirements. But these tools do not make good use of metadata. Even if they use metadata at few places, there is no intuitive interface available. Thus I got my first lead and I decided to make use of metadata as much as possible. Hence, quest for “Metadata driven ETL” starts….

Before I tell you what Metadata driven ETL is, let me tell you “what” it is not. It does not generate the ETL program on the fly as you may expect. Instead metadata driven ETL uses an existing program that executes various tasks by dynamically reading parameters from metadata datastore. This datastore can be an xml file, a separate database repository, few tables in DW database.

WHAT IS METADATA: Any data that is used to execute a transform/operation is called metadata. For e.g. from transformation perspective, in slowly changing dimension transformation type II, the target columns that are considered as business keys is nothing else but metadata. Similarly from operations perspective, the list of database tables that must be cleared before the extraction starts, defines metadata. When we use ETL tools’ standard transforms, or hand-code ETL, we use metadata in one way or the other.

USING METADATA:
How shall we use this metadata to our advantage? The best approach to start with metadata driven ETL is to:

  1. Identify different operations in ETL that can be automated. There are certain operations that are carried out during ETL which can be automated. These operations may involve multiple tables. Usually these operations are put into some script or SQL and are executed as one operation. Let’s look at an operation where we clear clearing staging tables before the extraction starts. Usually staging tables are designed as follows:
    1. Create all staging tables in a separate database
    2. If staging area is same as data mart, create a specific schema like “STAGING” or prefixed the table names with something as “STG_”

Whichever way we go, this list of tables can be read from database system tables (metadata). Using this metadata, this operation can be automated. On the similar lines, we can create specific metadata to be used for other ETL operations. This would make the ETL development easier and faster and lesser probabilities of error. Here are the other sample operations that can be made metadata driven:

  • Moving rejected records to rejected area
  • Reprocessing rejected records
  • Data cleansing: For all incoming data, based on unique keys identified, duplicate records need to be deleted. For this, there isn’t an in-built transform in ETL tools. This can be made metadata driven. ETL would read unique keys for each incoming business entity and dynamically creates and executes it. Thus, developer would just need to specify the unique keys in metadata datastore. This again would bring down the development effort.
  • Notification: Upon completion of an ETL job, it can read metadata (from address, recipients addresses, mail server, subject, body) and this process can be made metadata driven. We all do this already. What is different here is how we implement it.
  • Auditing and logging
  • Identify transformations that can be made metadata driven: There are lots of transformations that can be made metadata driven. We need to identify those transforms and make it metadata driven. For example, let’s see the slowly changing dimension type II. This transformation is heavily used in any ETL program. In this transform, incoming records are compared for pre-defined key columns to the existing records in data mart. If the records exist in data mart they are updated, otherwise they are inserted. For every SCD II, the transform is to be coded. This transformation needs following metadata:
    • Source and target table names.
    • Business keys column names in source table.
    • Key columns in target table.

With the tools like BO or SSIS, this transform is kind of hardcoded. To make this transform metadata driven, read the above mentioned metadata and dynamically create “Merge SQL statement” (available in DB2, SQL Server 2008) in a stored procedure. This stored procedure would be called from the ETL at appropriate places, for all SCD-II transforms. This metadata driven SCD-II transform would give you the best performance. Let’s consider that the staging area is within the data warehouse database or in a separate database on the same server. If some ETL tool’s transform is used, the data is processed in batches of some pre-defined size. ETL engine would apply this transform and fire appropriate SQLs for every incoming row in the batch which is definitely slower. However in metadata driven transform which is a SQL operation, it processes all the data in bulk and we get increased performance.

IMPLEMENTING METADATA: When we use any ETL tool for transformations or for other purposes, we are in fact using metadata. The difference lies in implementation and the usage of metadata. How and where this metadata is stored? How do we access this metadata at run time? Is there any single interface available to access metadata? When we use ETL tool for some transformation, this metadata is stored in the proprietary format. We cannot simply go and change it directly. We need to use the ETL tool designer to access it. Thus, after the ETL program is deployed, if there is a change in some transform, we need to make the change through ETL tool designer and re-deploy the ETL program. So if we design the ETL in a manner wherein we use our own metadata in our own way, we would end up with a very good ETL architecture. This ETL architecture would eventually evolve into a framework. This framework can be reused across multiple ETL implementations which bring down the estimates and thus the cost significantly. This would give an added advantage to us over our competitors.

With this approach, you would get following advantages in metadata driven ETL:

  • Increase performance
  • Reduced development time
  • Lesser errors
  • Easier and least maintenance which translates into cost savings for customers in long term (added value)
  • Once the basic shell of metadata driven ETL is created, it would reduce the learning curve for the team members.

Then, I started with this mindset, and I was successful enough. Now I am in the process of refining the ETL architecture.

AT LASTL, at the same time creating metadata driven ETL is in no way suggests us to get away from the ETL tools. ETL tools have their own strengths. We would actually be architecting ETL in a different way to make it metadata driven. Metadata driven ETL pays rich dividends during maintenance phase also as it takes lesser effort and allows for quick deployment.

LOOKING FORWARD TO: Some of the above mentioned metadata is already captured in ETL tools available today, but still they need to evolve more. So until, the ETL tools support 100% metadata driven development, we as architects, shall design our ETL in a certain way to fill in the gap. I do hope commercial ETL tools vendors would be working on this and we will soon have next generation of ETL tools.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Jagdish Malani Architecture & Design, BI - Business Intelligence , , , ,

IT: Competency Building

March 26th, 2009

IT organizations were doing well until sometime back. Subprime crisis led to fall of many banks in US and before we know, we were in recession. Though nobody knows when this recession would end, but we all hope that it would end soon. Until this happens, IT organizations are facing typical problems that are often seen during recession times such as rising bench strengths, increased costs, and lower revenues. To make the matters worse, there is uncertainty about when the economy would turn around. Hoping that recession would end soon, organizations shall use this slowdown to their advantage and should prepare themselves for the good times. First thing that organizations are trying to do is to increase the employees’ utilization by effectively training employees on newer and in-demand technologies. They are reluctant to hiring at any levels. Thus organizations are focusing on competency building internally. So in the light of cut-throat competition, building competency has to be aligned with organizations strategy derived from sales planning and operations management.

Let’s see how organizations shall go with competency building . . . .

  1. IDENTIFY THE GOALS: Competency building exercise needs some retrospection before organizations take the first step. The reason being, this exercise would have been done even during good times. So organizations must examine the earlier efforts meticulously. They must find out the success rate and impact of it in various projects. Otherwise following the same approach, would yield same results. If organizations are struggling with the competency building since good times, this is an indicator that something wasn’t done right earlier. And during slowdown, it is critically important for organizations to take the right step, or they risk ending up in wrong direction altogether. Going forward, when the economy picks up, organizations that are strategic would have the edge over their competitors. Organizations must approach competency building as follows:

    1. Identify the areas in which the competency needs to be built:People who drive sales strategy (typically senior management) and people who are responsible for operations need to come together and align their goals. The mistake that few organizations do is that they run the competency building in silo. That way, they are never able to build the competency that is required for driving sales growth.
    2. Define the extent of competency building:They must work out the approximate target numbers in each level in pyramid. These numbers must be tied to the sales targets both in the short term as well as long term.
    3. Expectation Management: The next most important factor that defines the success in competency building is to have the right perception. Training employees in a new technology does not make them experts in one go; instead they get a quick and timely head start in the new area. The fact that every organization while hiring, look for a specified years of experience in a particular stream is true across all verticals and at all levels. If this wasn’t true you would have seen advertisements like “We need 5 smart people with or without prior experience for all levels”. Having said so, this does not mean competency cannot be built. It can be build if organizations define a good strategy and ensure its strategy meets the overall organizational goals. For e.g. management must provide for specific follow-up trainings and live projects experiences (even internal project would suffice) and see this exercise through.�
       
  2. ACHIEVE THE GOALS: Once the organizations figure out their goals clearly as stated above, they must ACT to achieve the specified goals. Organization can run different specific programs to achieve these goals. One way would be to run training programs in the areas in which they need to build up the competency. Another would be to use employees available on bench to develop internal tools which are required in order to improve the organizations’ efficiency internally. This shall be done by remaining focused and ensuring that these programs are aligned with their overall goals. Organizations must work to ensure following:

    1. Focused trainings: The training programs should be very much focused. Organizations must do their due diligence in doing gap analysis. They shall consider the business lines they are in and also explore this opportunity in expanding their horizon in different areas. This gap analysis can be done by:
      1. Analyze earlier projects: Organizations must spend time in analyzing earlier projects from different perspectives. They must find out what went wrong and what are the specific areas they must improve upon.
      2. Analyzing earlier sales deals: Also organizations must analyze the pre-sales deals that did not materialized. Few reasons that the sales deal did not materialize could have been like no fitting resources, no prior experience, poor estimates, high cost. Focused training would help organizations to fill in these gaps.
    2. Role based trainings: Next organizations must evaluate their employees. They must get the buy in from the employees being trained. Another thing that organizations must ensure is that these trainings are role-based. This means that if an lead level employee is trained on a new technology, organizations must figure out if the same employee would be able to play the similar lead role in that new technology. To play a specific role in any technology needs some prior experience and this becomes important for the senior roles especially. This holds true for not only during recession times but also during good times.
    3. Development of internal tools: Development of internal tools, are good for all organization in various ways such as:
      1. Building competency
      2. Manage operations effectively
      3. Evaluate a technology and
      4. Helps in sales pitch

    But before organizations jump into developing internal tools, they must take care of few things to be effective:

    1. Selection of right tools to be developed: Organizations must ensure that they are investing in developing the right set of internal tools, which are really needed. When approached, every manager or head of department would have a long list of internal tools that they would like to be developed for them. After all they are not paying for this. This would lead to a bigger mess (multiple applications using different technologies with no consolidated data) later if not managed properly. Organizations must take a holistic approach in deciding what tools to be developed, what technology to be used, and their priorities. Organizations must not use any technology just for the competency building sake. This would help them in coordinating training programs effectively with greater success.
    2. Execution model:After identifying the tool to be developed and the team that is going to work on it, organizations have to make sure the project is executed in a right model, as if it is a live project for a customer. Organizations must not take any shortcuts here and swap the roles or responsibilities within the development team executing the project. For instance, project managers must not be collecting requirements if in live projects they aren’t suppose to play that role. Right set of people shall be given the right set of roles.
  3. PERIODIC EVALUATION: Competency building efforts shall not go on for a long time without having any mechanism to measure the success. Their success must be measured periodically. This would be good for both organizations as well as employees. Organizations would be able to deploy new people effectively. If employees are not convinced within themselves, they would not be giving their 100%.

IT organiztions with an effective strategy, strong leadership and vision, would be able to build the right competency J . . .

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Jagdish Malani Project Management

WMI and .NET Performance Hiccups : Win32 – the Savior

March 26th, 2009

… or read the title as – How could you speed up your software by 90% - There is always a way out there for tuning performance… this blogs is about one such instance where I dumped WMI (Windows Management Instrumentation) and turned back to Win32 just for performance gains.

Sometime back I was involved in a project which involved lots of hardware interfaces like interacting with huge SCSI Devices, Parallel Ports, Digital Imaging et al. Though I was a fan of Win32 I was thrilled with how WMI does wonders to reduce the development time drastically primarily because of its matured API’s. I was thinking about how difficult it would be for beginner-intermediate programmers to work on system level programming using C, C++ to interact with hardware, network devices, communication devices et al… and how error prone those codes are… WMI is definitely best in this case.

And many times WMI has always saved our development time. But there were moments where we happened to hit with lots of performance hits during our product testing. While probing out the reasons, it was really surprising to see the reasons of performance hiccups. Sometimes it was development team’s oversight, sometimes poor WMI was the culprit.

Here I’ve given 2 instances which gave us a considerable amount of performance boost for my product.

Handling System.Drawing.Image Performance using C#

The Product I was working on previously had to load & Edit JPEG Images, which were of Digital Quality, which means the Image size would be greater than 3-5 MB. My customer has been always complaining about image loading speed, everything was fine while loading a 1-2 MB Image, but things started to change drastically whenever a 3 MB or greater image is loaded, it took around 1-2 sec to load a image. 2 sec is fine for normal applications, but not for people who work with thousands of images per day, the Windows Form would almost hang before loading up the image. After a considerable amount of research I found out that the real culprit, which caused the bottleneck is the line “System.Drawing.Image.FromFile()”

I happened to hit a KB article which confirms this issue. And had a hotfix[!!], which updates
1. System.Windows.Forms.dll
2. System.Design.dll
3. System.Drawing.dll

Infact there was an interesting new signatures under System.Drawing.Imaging System.Drawing.Image.FromStream(Stream stream, bool useICM, bool validateImageData) This bool validateImageData was the real cause for the image being slowed down, which validated the content of the image file before loading up. So as size of the image increased, the loading time increased exponentially.

So I had to lookout for an alternate. The obvious choice was Win32. and here is the method equivalent to Image.FromFile()

public static Image Win32ImageFromFile(string filename)
{
    filename = Path.GetFullPath(filename);
    IntPtr loadingImage = IntPtr.Zero;
 
    if (GdipLoadImageFromFile(filename, out loadingImage) != 0)
    {
        throw new Exception(”Oops! GDI Exception.”);
    }
    return (Bitmap)imageType.InvokeMember(”FromGDIplus”,BindingFlags.NonPublic | BindingFlags.Static| BindingFlags.InvokeMethod,
    null
, null, new object[] { loadingImage });
}
 

And now, when I used this new method… voila! the images started loading up atleast 90% Faster and took less than 10Millisecond to load! Wow! That was really great and amazing.Handling Performance issues in Win32_LogicalDisk using C#

Here is another instance, where I dumped WMI and used Win32 Instead.
Here is the simple WMI code which would list the removable drives of the computer.

# region “WMI Code to retrieve Drives”

ManagementClass driveClass = new ManagementClass(”Win32_LogicalDisk”);
ManagementObjectCollection drives = driveClass.GetInstances();
StringCollection driveCollection = new StringCollection();
try
{
   
foreach (ManagementObject drv in drives)
{
//Check is made to find whether the drive is from removable storage device
if ((drv[”Description”].ToString()==”Removable Disk”) && (drv[”DriveType”].ToString()==”2″))
{
    driveCollection.Add(drv[”Caption”].ToString());
}
} }
# endregion 
This code would take a minimum of 4-5 seconds to enumerate my disk drives. And another problem is that, every time the floppy drive is also physically checked[but… why!!], which further slows down the execution time. None of our clients would accept this, when this feature is used frequently. There was no way to solve this issue except to lookout for a Win32 Method, and here is the alternative… Using Win32

# region “WIN32 Code to retrieve Drives”
[System.Runtime.InteropServices.DllImport(”kernel32.dll”, SetLastError=true)]static extern uint GetDriveType(string lpRootPathName);
/* Retrieves All the Mounted Drives on the computer. */
string[] _drives = System.Environment.GetLogicalDrives();

foreach(string _drive in _drives)
{
    /* Call Win32 GetDriveType to determine the Drive Type,based on the Drive Letter */

//Check whether the passed Drive is a Removable Disk Type     
_driveTypeLength
= GetDriveType(_drive);    
if
(_driveTypeLength == 2 || _driveTypeLength == 5)
    {
         driveCollection.Add(_drive);
    }
}
# endregion

This code executed in less than 100 MilliSeconds !!! That was an incredible performance boost.
 Do you think using Win32 as an alternative is insane? Have you faced such realtime problems? Would you still use WMI? Talk back!

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Logu Krishnan C#, Performance , , , ,

Can Project Manager be a Role Model?

March 22nd, 2009

Let me start this blog by asking a few questions:

  • Do you think the Project Manager is the role model for all the team members?
    • How often does this often?
  • Is PM really respected from his/her team members?

Before I begin with opportunities of PM (Project Manager) to become a role model; let’s quickly understand what this role truly demands. This is important because there is a myth and wrong belief among various project team members that PM’s are mere coordinator and are usually there to police around them and take action. There might be some instances where Managers leave such notion.

So, who is a Project Manager?
A person who takes ultimate responsibility and guarantees for the desired result to be achieved on time and within budget is the Project Manager. A PM has an overall responsibility for the successful planning, execution, monitoring, control and closure of a project.

Well, in this blog, I want to highlight how easily a PM can get confused between the process and the goal. In such scenario, PM’s usually gets inclined towards quantifying things that does not add any value. Not sure of what else to do, they tend to occupy their time with less important activities such as metrics, spreadsheets or reports. This makes team members belief that PM is not adding any value. Team members can very easily carry these thought of their PM’s in all of their future projects. By definition, ‘project is a progressive elaboration’ and PM getting stuck with such non value added activities increases the gap between project and the PM.  For the PM, he/she is in false belief that if the project team just pursues the processes to perfection and follow the checklists; they are bound to succeed in the project.

A good Project Manager does not carried away with such stringent web of procedures; instead they are flexible enough to tweak the processes according to the project needs. A PM should always keep an eye on the business goal that is achieved by accomplishing a set of tasks/work and by a bunch of people (team members). To overcome the confidence and respect of the team members, PM has to educate the team members on various roles of the project team members including their own roles. As project progresses, PM has to measure each roles with their set objectives and not only provide feedback but also show the direction to conquer the gray areas. It’s important to ensure Managers spends enough time with each individual in the team and help them in cultivating a desire for achievement. Getting the best possible performance from each team member is the responsibility of the manager.

In a nutshell, PM must get involved in the project by understanding not only the business goal but all the functionalities of the requirement, by understanding the high level design architecture and most importantly by expediting the overall process of quality development by timely removal of obstacles and by providing directions that lead to right solutions at right time.

Good PM or leaders can thus earn special kind of respect from their team members (developers, testers, architects…). PM should be able to enable: act of thinking, strategy and leadership that positively impact the team. I am sure all of you would now agree that PM can easily be a role model which indeed also depends on his/her personality.

Please feel free to post your comments….

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Shrinath Inamdar Project Management , ,

Mobile Everywhere

March 7th, 2009

Mobile Application Architecture Series – Design Considerations

Mobile applications inherit various challenges which need to be considered while designing on any platform (Microsoft / Symbian / OS X etc). The following section tries to address various design guidelines for a mobile application development. Technology stack considerations would be the next in this series of blogs.

A mobile application will normally be a multi-layered application comprising user experience, business and data layers. When developing a mobile application, you may choose to develop a Thin-Client or Rich-Client. If you are building a rich client, the business and data layers will be located on the device itself. In Thin-Client, the business and data layers will be located on the server.

Design Considerations:

Follow these design guidelines to ensure that the application meets your requirements and performs efficiently in mobile world.

1)      Thin-Client or Rich-Client or RIA.  If your application requires local processing and must work in an occasionally connected scenario, consider designing a Smart (thin) client. But, a Rich-Client application will be more complex to install and maintain. If the application can depend on server processing and will always be fully connected, consider designing a Thin-Client. If your application has a rich user interface, only limited access to local resources, and portability to other platforms is required, then RIA would be a good choice (eg. ETrade).

2)      Device Types to Support: While choosing which device types to support, consider screen size, screen resolution (DPI), CPU performance characteristics, available memory and storage space (inbuilt & external) and Integrated Development Tool availability. On top of the above, we need to consider the User Requirements and Organizational Constraints / Guidelines. Any additional components such as GPS and Integrated Camera may influence your choice of application type and device type.

3)      Connectivity: The real time connectivity requirement between the mobile device and Gateway would highly influence the decision of whether to go for Thin-Client or Rich-Client or RIA. If an application requires an intermittent network connection, our design approach should properly handle caching, state management and data access mechanisms.

4)      UI design considerations: Mobile devices requires simple UI in order to work within the constraints imposed by the device hardware such as memory, battery life, different screen sizes and their orientations and network bandwidth.

5)      Layered Architecture: Depending on the application type, multiple layers may be located on the device. Use the concept of Layers to maximize the separation of concerns, and to improve reuse and maintainability for your mobile application. However, aim to achieve the smallest footprint on the device by simplifying your design compared to a desktop or web application.

6)      Security: Designing an effective authentication and authorization strategy is important for the security and reliability of your application. Mobile devices are usually designed to be single-user devices and normally lack basic user profile and security tracking beyond just a single password. Mobile applications can also be especially challenging due to connectivity interruptions.

7)      Caching: Use caching to improve the performance and responsiveness of your application, and to support operation when there is no network connection. Use caching to optimize reference data lookups, and to avoid network round trips. When deciding what data to cache, consider the limited resources of the device; you will have less memory available than a PC.

8)       Communication: Device communication includes wireless communication (over the air) and wired communication with a host PC, as well as more specialized communication such as Bluetooth or Infrared Data Association (IDA).  When communicating over the air, consider data security to protect sensitive data from theft or tampering.

9)      Performance Considerations:

a.       Design configurable options to allow the maximum use of device capabilities. Allow users to turn off features they do not require in order to save power.

b.      To optimize for device resource constraints, consider using lazy initialization.

c.       Optimize the application to use the minimum amount of memory. When memory is low, the system may release cached intermediate language (IL) code to reduce its own memory footprint, return to interpreted mode, and thus slow overall execution.

d.      Consider using programming shortcuts as opposed to following pure programming best practices that can inflate the code size and memory consumption. This decision should be a thoughtful one as it may contradict the design principles of OOPS and maintainability.

e.      Consider power consumption when using the device CPU, wireless communication, or screen while on battery power. We should balance power consumption with performance.

In my view, the above listed considerations are only a subset of complete list, but tried to address the key areas in it. One basic design approach for most of the mobile applications is “avoid BDUF”, Big Design Up Front, which recommends design evolving over time and avoid making a large design effort prematurely.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Sreenivasa Rao Pilla Architecture & Design, Mobile Architectures , , ,