Archive

Archive for April, 2009

IM on Group Chat client

April 15th, 2009

The Office Communicator client offers much more than just IM – it is a rich application that offers conferencing, voice, and desk sharing among other features. For business users in the Financial Services space, these features can be very useful. However, chat (IM and topic-based) is the most important medium of collaboration for these user communities. Given that Group Chat is available on a client that’s different from Communicator, organizations will have to roll out two separate clients to users to offer the whole gamut of communication features.

When talking to MindAlign customers, a question we hear quite often is whether users have a choice about how they use these clients. When a user has both clients installed, by default, IM is not available on the Group Chat client. So a valid concern is the impact to productivity as users switch between the two clients for IM and Group Chat. One option for organizations is to roll out only the Group Chat client. IM is then available on this client, and users can maintain their contact/buddy list at one place. The fact is that users are likely to be using IM and Group Chat a majority of the time, so this might be a safe choice.

But users would definitely like access to conferencing, voice and other communication features – these are what have the potential to increase business productivity and provide return on investment from the new platform. So does this mean that business users will be stuck switching between two chat clients on a day-to-day basis? Fortunately, no. Microsoft allows users to decide on the end point for IM with a simple registry setting.

Here are the steps to enable IM in Group Chat when you have both clients installed (when you only have Group Chat, you don’t have anything to worry about – you can use IM within Group Chat by default):

1. Add an entry, DisableIM under KEY_CURRENT_USER\Software\Policies\Microsoft\GroupChatConsole\Permissions. This should be of type DWORD -  make sure the value is set to 0

2. Restart Group Chat – on logging in, you should see your buddy list and be able to initiate a private conversation.

3. If you are logged into Group Chat only, and someone sends you an IM, a new Group Chat window is launched. If you are logged into both clients, then both clients are notified. The client you pick first will handle the conversation. A different IM conversation can be initiated from either client.

4.  If you’d like to restrict IM to Group Chat alone, then you can disable IM on Communicator. You can use the registry entry DisableIM type DWORD = 1 under HKEY_CURRENT_USER\Software\Policies\Microsoft\Communicator - You can continue to use Voice, Conferencing and other features on Communicator.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Sweta Jagirdar Unified Communications , ,

What is Unified Messaging?

April 9th, 2009

Unified Messaging is an Exchange Server role that was introduced in Exchange Server 2007. It enables you to access your message types (e-mail, voicemail, fax, SMS text) from your Outlook email client. The basic idea behind Unified Messaging is that users communicate in a variety of different ways. Some users prefer to send E-mail messages, while others prefer using the telephone and some users might need to have live discussion. Now, if you bundle these choices into one single entity you get - Exchange server 2007. And if you add flavour of instant messaging you have - Microsoft OCS/GCC  :)

Now you will ponder, what is the term Unified doing here? - Well, the term Unified refers for the basic message layer being common for different message type. This means, data gets captured in single formative body that can be represented in different formats as per the chioce of end user with subtle UI variations.

The term Unified Messaging(UM) is sometimes confused with Unified Communications(UC). UM systems culls messages from several sources (such as email, voice mail and faxes), but holds those messages for retrieval at a later time. And the way or underlining protocol that does this job of delivery forms your UC.

In a company though, a user typically has two separate mailboxes; one for E-mail messages, and another one for voice mail messages (google/yahoo etc ..). Furthermore, voice mail has traditionally been tied to the telephone. Although it is common for voice mail to be remotely accessible, users often find themselves writing down names, numbers, or messages on pieces of paper, which often get lost! (at-least it happened with me :))

Microsoft designed Exchange 2007 so that the Inbox allows users to store E-mail messages, voice mail messages, and faxes all in the same place. This frees the user from having to look for messages in multiple locations. It also gives users a way to make voice messages search-able; just in similar fashion we search our mails in Outlook. Now with OCS 2007 R2 you can check offline messages and post emails as chat to your internal chat environment.

So in brief Unified Messaging can provide you:
- Voice mails/Phone calls/Missed calls/call waiting/Forwards/Call forking
- IM chats/Group messages with Emotions :), :( etc..
- Create own filters/triggers
- Contact grouping
- Remote call controls and remote desktop sharing
- Web conferencing

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Vishvesh Ram Raiter Unified Communications , ,

ETL: How to handle bad data

April 7th, 2009

During any ETL design, we implement various functionalities like validation, auditing, notification, job recovery, job logging, data cleansing, handling bad data, etc. I am going to talk about handling bad data in this blog. At a top level, ETL design allows for bad data to be rejected and sent over to the appropriate users in the form of files. But, in my opinion, there is more to this than meets the eye. As an ETL architect, our responsibility does not end there. So before I get into details of how shall we handle bad data, let me tell you how what makes the incoming data bad enough to be handled. Following are the few reasons that generate bad data:

  1. A set of business rules are laid out that define whether the incoming data is good or bad. Let’s consider the sales record where the cost of the product must be present. If the cost contains a null value or a negative number, the sales records would be considered bad.
  2. Any data that would not satisfy the referential integrity in data warehouse database. This usually happen in case of missing inter-dependent data. If the incoming data contains references to some other data, which could not be loaded for some reason, this incoming data becomes bad and shall not be loaded into data warehouse database. A typical example would be a retail chain maintaining product master in a centralized database. And the sales data is generated across different POS terminals. So during ETL, if corporate database is down for whichever reason, ETL would not be able to load new products. But at the same time, there is sales generated for these new products. Thus with no product in product dimension (master) the sales record is considered bad at this moment and would not be loaded into data warehouse database.
  3. Missing business keys: If mandatory data is missing in the incoming data then that data is considered bad. This usually does not happen if the data is sourced from other relational databases. While sourcing data from files, there is every possibility that data may go missing, even if the format of the incoming feed file is already decided.
  4. Missing data: There may be many cases when the data is missing in the incoming data. That logically makes the data bad. For example an incoming product feed file contains record with no product code and product description. In this case the data is considered invalid.

Now, as we have seen what generates the bad data, we must understand that data cleansing does not make any data bad. Data cleansing is applied only on good data. So before we start data cleansing, a mechanism needs to be put in place to identify good data from the bad data.

Once this bad data is identified, it is usually stored in a separate area called “Rejection Area”. This rejection area can be in a separate schema in the same database that contains staging schema or it can be a separated database altogether. The structure of the rejection area (table structure) is similar to that of staging area with the additional few columns. These additional are required to store metadata about rejected data.

As an ETL architect we need to design our ETL to provide following functionality:

  1. Ability to reprocess this bad data whenever required. The data that could not be loaded due to missing references is usually re-processed when that missing data is loaded into the data warehouse database. Consider the case where due to missing product master, sales data was rejected. Later, when the latest product master is loaded, this bad data (which wasn’t really bad) which could not be loaded needs to be reprocessed. Otherwise the sales summary report would not be accurate. Another factor when the bad data needs to be reprocessed is change in business rules. If there is a lot of rejection due to strict validation rules, customer may decide to change these validation rules so that large amount of data is not rejected.

    We can automate this functionality by adding few columns in the rejection area table:

    1. Reprocess_flag (Y|N): This flag when set, determines if this record needs to be re-processed. Moving these rejected records from rejection area to staging area should be automated. This would help customers in various ways such as reduced the dependency on IT staff, lower maintenance cost.
    2. Reprocess_Job_Id: Usually the metadata about each run of job is maintained. So to be able to audit, when the rejected record was reprocessed, the job id is stamped into this column.
    3. Active_Flag: Once the record marked to be reprocessed, is copied over to staging area, this record is made inactive as this record is not valid anymore. It may happen that this record may fail validations again and end up in rejection area, but it would be considered as another record. So in essence, there is only one instance of the rejected record in rejected area. This implies that inactive records cannot be chosen to be re-processed.
  2. Ability to reprocess incoming data: Many times due to various reasons, the same data that is already processed is fed again into the ETL. This requires us to identify the corresponding records in rejection area and mark them as inactive as these records are not valid anymore. The incoming records would need to be validated as per the current business rules. The records in staging area are compared with the records in rejection area against the business keys and for matching records active_flag is set to ‘N’. This process of marking existing rejected records as inactive is usually automated.
  3. Ability to mark invalid data: Sometimes the business keys in the incoming records are null. These records are then eventually end up in the rejection area and are active. These records, no matter how many times are reprocessed, they would end up in rejection area again. At the same time, the incoming records can be matched with these records. Thus these records should be marked as invalid. For this add following column in the rejection area table:
    1. Valid_flag (Y|N): This flag must be set to ‘Y’ for missing business keys. This implies that this record cannot be re-processed ever.

    The important fact to be noticed here is that the responsibility of ETL architect does not end here. By designing the ETL, to mark the rejected data as invalid does not solve any business problems. The incoming data must get loaded into the data warehouse database. So it becomes very important for BI architect to talk to end users and tell them the impact of this. The end users may need some tweaking in the source systems, but if they need accurate reports they must send the accurate data.

What I have explained is just one way of designing rejection area (tables containing bad data). What I have discussed is the concept of handling bad data. There can be different ways of implementing the above mentioned functionality. Once ETL and database is designed appropriately, an interface must be provided to the end users that allow users to do following:

  1. Select any rejected table and mark rejected records to be reprocessed.
  2. Select any already processed data for reprocessing. This is simple if the incoming data is in the form of feed files. This gets little bit complex when the data is extracted from existing databases. Typically in large ETL systems, the staging area is archived. Depending on the needs of the customer, this goal can be achieved.
  3. Look at the invalid data and analyze it to be able to fix the source systems accordingly.
  4. Execute the ETL job after selecting the rejected records or selected source data for reprocessing. This would depend on various other factors such as ETL time window, the need to reflect the correct data, the time of ETL run, etc.

Last but not the least, as an ETL architect our goal shall not be just to implement some logic to handle bad data. Our main responsibility is to make this whole thing as automated as possible. Automation would provide various benefits such as reduced development time, lesser errors hence increased quality and finally the reduced cost for the customer.

If you have any questions or comments, you can reach me at jagdishm@aditi.com.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Jagdish Malani Architecture & Design, BI - Business Intelligence , , , , ,