<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for ADITI Blogs</title>
	<atom:link href="http://aditiblogs.com/blog/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://aditiblogs.com/blog</link>
	<description>Trailblazers. Fast Pacers.</description>
	<lastBuildDate>Thu, 29 Dec 2011 21:05:46 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
	<item>
		<title>Comment on ETL Design Pattern : E-LT-L by Jagdish Malani</title>
		<link>http://aditiblogs.com/blog/blog/2009/03/06/etl-design-pattern-e-lt-l/#comment-16636</link>
		<dc:creator>Jagdish Malani</dc:creator>
		<pubDate>Thu, 29 Dec 2011 21:05:46 +0000</pubDate>
		<guid isPermaLink="false">http://aditiblogs.com/blog/blog/2009/03/06/etl-design-pattern-e-lt-l/#comment-16636</guid>
		<description>Hi Praveen,

I am not suggesting to get away from ETL tool. The point I am driving at, is to bring staging area nearer to DW area. That would give us an option to move data using stored procedures as well if the need arises. So this in my opinion is driven by needs and the infrastructure available. Let&#039;s assume, we have to laod huge amount of data. Now I have these options to laod data:

a. Using ETL tool (which may execute a job from another server) and do the transformation and loading.
b. Along with using ETL tool, in some cases I may prefer to move data from Staging database to DW database using say stored procedure. 
c. Use mix of both

So what needs to be done is being done. Only the way it is implemented is different. If there are any updates (loading/updating)  required in DW database, they would be done by ETL tool or some SP or some SQL.

Also, I agree, we have really powerful ETL tools available these days. And in my opinion, partitioning, bulk loading etc features of ETL tools are available in databases already. So we have to try various options and test the results. Based on that we can make a decision how to implement the solution.


Thanks,
Jagdish</description>
		<content:encoded><![CDATA[<p>Hi Praveen,</p>
<p>I am not suggesting to get away from ETL tool. The point I am driving at, is to bring staging area nearer to DW area. That would give us an option to move data using stored procedures as well if the need arises. So this in my opinion is driven by needs and the infrastructure available. Let&#8217;s assume, we have to laod huge amount of data. Now I have these options to laod data:</p>
<p>a. Using ETL tool (which may execute a job from another server) and do the transformation and loading.<br />
b. Along with using ETL tool, in some cases I may prefer to move data from Staging database to DW database using say stored procedure.<br />
c. Use mix of both</p>
<p>So what needs to be done is being done. Only the way it is implemented is different. If there are any updates (loading/updating)  required in DW database, they would be done by ETL tool or some SP or some SQL.</p>
<p>Also, I agree, we have really powerful ETL tools available these days. And in my opinion, partitioning, bulk loading etc features of ETL tools are available in databases already. So we have to try various options and test the results. Based on that we can make a decision how to implement the solution.</p>
<p>Thanks,<br />
Jagdish</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Windows Messaging Architecture by Bijay</title>
		<link>http://aditiblogs.com/blog/blog/2010/08/25/windows-messaging-architecture/#comment-16238</link>
		<dc:creator>Bijay</dc:creator>
		<pubDate>Wed, 16 Nov 2011 06:15:01 +0000</pubDate>
		<guid isPermaLink="false">http://aditiblogs.com/blog/blog/2010/08/25/windows-messaging-architecture/#comment-16238</guid>
		<description>That is an excellent explanation.</description>
		<content:encoded><![CDATA[<p>That is an excellent explanation.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on ETL Design Pattern : E-LT-L by Praveen</title>
		<link>http://aditiblogs.com/blog/blog/2009/03/06/etl-design-pattern-e-lt-l/#comment-16153</link>
		<dc:creator>Praveen</dc:creator>
		<pubDate>Sat, 05 Nov 2011 18:49:53 +0000</pubDate>
		<guid isPermaLink="false">http://aditiblogs.com/blog/blog/2009/03/06/etl-design-pattern-e-lt-l/#comment-16153</guid>
		<description>Hi,

This was a good article. You have pointed most of the common problems that we face in Datawarehouse load.

However I am a little confused on the use of Stored Procedures. Most of the time we have to implement some difficult logic which is little easier to do using an ETL tool compared to writing procedures (maybe that is the primary purpose of ETL...to make a developer&#039;s life a little easy :).... ). Also if we write a stored procedure we may, most of the time, end up using cursors which again does a row by row processing rather than bulk processing. This when we compare to a datawarehouse load (where the number of records is too large), using a cursor can be a bottleneck as it needs a large memory. Don&#039;t you think this problem is better handled by the ETL tool which has its own dedicated memory and some other advance concepts like partitioning and parallelism?

One of the examples we were given was, sorting a large number of records would be better done through a powerful ETL tool than doing it on the database. This is significant as sorting plays an important role in our development.

Another problem we faced was, as the datawarehouse and data marts grows up, even querying a database takes up a long time. In that case if we use ELTL method as you described, won&#039;t it be a little difficult to update (transform) the records in the database after loading them when selecting those records itself  takes ages? This problem won&#039;t be that significant if we use ETL approach as we are just loading the database not updating it (and as you said bulk loading feature of ETL is very good at this).

Today we have some very fast databases which are specifically designed for high volume data (like Teradata, Kognitio). This may prove your point too.

As you pointed out, this is a good topic to debate on.

Regards,
Praveen</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>This was a good article. You have pointed most of the common problems that we face in Datawarehouse load.</p>
<p>However I am a little confused on the use of Stored Procedures. Most of the time we have to implement some difficult logic which is little easier to do using an ETL tool compared to writing procedures (maybe that is the primary purpose of ETL&#8230;to make a developer&#8217;s life a little easy <img src='http://aditiblogs.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> &#8230;. ). Also if we write a stored procedure we may, most of the time, end up using cursors which again does a row by row processing rather than bulk processing. This when we compare to a datawarehouse load (where the number of records is too large), using a cursor can be a bottleneck as it needs a large memory. Don&#8217;t you think this problem is better handled by the ETL tool which has its own dedicated memory and some other advance concepts like partitioning and parallelism?</p>
<p>One of the examples we were given was, sorting a large number of records would be better done through a powerful ETL tool than doing it on the database. This is significant as sorting plays an important role in our development.</p>
<p>Another problem we faced was, as the datawarehouse and data marts grows up, even querying a database takes up a long time. In that case if we use ELTL method as you described, won&#8217;t it be a little difficult to update (transform) the records in the database after loading them when selecting those records itself  takes ages? This problem won&#8217;t be that significant if we use ETL approach as we are just loading the database not updating it (and as you said bulk loading feature of ETL is very good at this).</p>
<p>Today we have some very fast databases which are specifically designed for high volume data (like Teradata, Kognitio). This may prove your point too.</p>
<p>As you pointed out, this is a good topic to debate on.</p>
<p>Regards,<br />
Praveen</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Parallel Programming in C# 4.0 using Visual Studio 2010 by Piyush Gadigone</title>
		<link>http://aditiblogs.com/blog/blog/2009/06/07/parallel-programming-in-c-40-using-visual-studio-2010/#comment-15286</link>
		<dc:creator>Piyush Gadigone</dc:creator>
		<pubDate>Fri, 22 Jul 2011 20:14:23 +0000</pubDate>
		<guid isPermaLink="false">http://aditiblogs.com/blog/blog/2009/06/07/parallel-programming-in-c-40-using-visual-studio-2010/#comment-15286</guid>
		<description>Thanks a lot for this!</description>
		<content:encoded><![CDATA[<p>Thanks a lot for this!</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on ETL Design Pattern : E-LT-L by Jagdish</title>
		<link>http://aditiblogs.com/blog/blog/2009/03/06/etl-design-pattern-e-lt-l/#comment-14817</link>
		<dc:creator>Jagdish</dc:creator>
		<pubDate>Mon, 30 May 2011 15:19:24 +0000</pubDate>
		<guid isPermaLink="false">http://aditiblogs.com/blog/blog/2009/03/06/etl-design-pattern-e-lt-l/#comment-14817</guid>
		<description>Hi,

Thanks for the detailed comments (although little bit blunt). 

Let me explain what I meant to say in this post:

&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt; where you are endorsing an approach of stored procedure over an use of ETL tool/components, in my experience would eventually lead to ridiculous performance problems, extremely high maintenance cost over the years and the solution would essentially have the same limitations as are applicable to a stored procedure, which are enormous.
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;

I am in no way endorsing the SP approach here. The point I wanted to bring up here is 
1. bring in the raw data as near as possible to the data warehouse database, so that we have a choice of using SP if required.
2. we shall be able to get away from row by row transformations if required. For instance, slowly changing dimension type 2. If that creates performance problems, we shall have an option to use Merge SQL instead. Even Ralph Kimbal suggests using it if there are lot of performance problems. You can find his article here:

http://www.kimballgroup.com/html/08dt/KU107_UsingSQL_MERGESlowlyChangingDimension.pdf

At the same time, I agree with your view that SPs have limitations. This the the reason, I called this approach as pattern. It is not a guideline that we shall follow.

&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;

You are tlking about ETL solutions design vs SP Programming. This is not the essence of this post. ETL tools allows for parallel execution which can not be done with SPs. From a maintenance and cost perspective, I would prefer to use Merge SQL statement for SCD type 2 (if I can use it) over ETL tool approach. I would not require ETL specialists working closely with DBAs and other third party vendors. Also consider the scenario, where a compnay started with say SSIS and later they decide to move the ETL implementation to Business Objects. 

Moreover, the reasons (pretty blunt) that you have mentioned above are in the real word very real. So we need to be practical and we shall also not be monolithic in our thinking.


Finally there were my thoughts which I posted. I leave it to the people to think about it when they are implementing their ETL solutions. I am in no way proposing rule of thumb as you have thought about it.

Please email me if you have any further comments at jagdishm@adit.com

Thanks,
Jagdish</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>Thanks for the detailed comments (although little bit blunt). </p>
<p>Let me explain what I meant to say in this post:</p>
<p>&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt; where you are endorsing an approach of stored procedure over an use of ETL tool/components, in my experience would eventually lead to ridiculous performance problems, extremely high maintenance cost over the years and the solution would essentially have the same limitations as are applicable to a stored procedure, which are enormous.<br />
&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;</p>
<p>I am in no way endorsing the SP approach here. The point I wanted to bring up here is<br />
1. bring in the raw data as near as possible to the data warehouse database, so that we have a choice of using SP if required.<br />
2. we shall be able to get away from row by row transformations if required. For instance, slowly changing dimension type 2. If that creates performance problems, we shall have an option to use Merge SQL instead. Even Ralph Kimbal suggests using it if there are lot of performance problems. You can find his article here:</p>
<p><a href="http://www.kimballgroup.com/html/08dt/KU107_UsingSQL_MERGESlowlyChangingDimension.pdf" onclick="javascript:pageTracker._trackPageview('/outbound/comment/www.kimballgroup.com');" rel="nofollow">http://www.kimballgroup.com/html/08dt/KU107_UsingSQL_MERGESlowlyChangingDimension.pdf</a></p>
<p>At the same time, I agree with your view that SPs have limitations. This the the reason, I called this approach as pattern. It is not a guideline that we shall follow.</p>
<p>&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;</p>
<p>You are tlking about ETL solutions design vs SP Programming. This is not the essence of this post. ETL tools allows for parallel execution which can not be done with SPs. From a maintenance and cost perspective, I would prefer to use Merge SQL statement for SCD type 2 (if I can use it) over ETL tool approach. I would not require ETL specialists working closely with DBAs and other third party vendors. Also consider the scenario, where a compnay started with say SSIS and later they decide to move the ETL implementation to Business Objects. </p>
<p>Moreover, the reasons (pretty blunt) that you have mentioned above are in the real word very real. So we need to be practical and we shall also not be monolithic in our thinking.</p>
<p>Finally there were my thoughts which I posted. I leave it to the people to think about it when they are implementing their ETL solutions. I am in no way proposing rule of thumb as you have thought about it.</p>
<p>Please email me if you have any further comments at <a href="mailto:jagdishm@adit.com">jagdishm@adit.com</a></p>
<p>Thanks,<br />
Jagdish</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on ETL Design Pattern : E-LT-L by Jagdish malani</title>
		<link>http://aditiblogs.com/blog/blog/2009/03/06/etl-design-pattern-e-lt-l/#comment-14750</link>
		<dc:creator>Jagdish malani</dc:creator>
		<pubDate>Tue, 24 May 2011 01:07:53 +0000</pubDate>
		<guid isPermaLink="false">http://aditiblogs.com/blog/blog/2009/03/06/etl-design-pattern-e-lt-l/#comment-14750</guid>
		<description>Hi Shamit,

Sorry for replying late. Was running busy with work.

Coming to your point, The purpose of staging area is to keep the incoming data at some designated place so that while transforming/validating/etc, we do not need persistent connection to source. The point to remember is that this data is still comsidered raw and not fully massaged. As you said, we do not store more than a days (or some specific period) unless required for some reasons.Thus, it does not matter where is your staging area, it is not usable from OLAP perspective. So you need transformation anyway.

For your second point, I mean staging area should be as close as possible to DW. In that case, we have flexibility to implement transformations in SQL. Let&#039;s consider a scenario: we are using say SSIS, where we need to do SCD type 2 for say 1million incoming records. Even if your job server and DB server are on the same box, it will be slower. You would resort to lookup transformations and some so called advance stuff in SSIS to tune it.
With SQL, you would have more control and best performance. 
Hope this helps. Thanks

Jagdish Malani</description>
		<content:encoded><![CDATA[<p>Hi Shamit,</p>
<p>Sorry for replying late. Was running busy with work.</p>
<p>Coming to your point, The purpose of staging area is to keep the incoming data at some designated place so that while transforming/validating/etc, we do not need persistent connection to source. The point to remember is that this data is still comsidered raw and not fully massaged. As you said, we do not store more than a days (or some specific period) unless required for some reasons.Thus, it does not matter where is your staging area, it is not usable from OLAP perspective. So you need transformation anyway.</p>
<p>For your second point, I mean staging area should be as close as possible to DW. In that case, we have flexibility to implement transformations in SQL. Let&#8217;s consider a scenario: we are using say SSIS, where we need to do SCD type 2 for say 1million incoming records. Even if your job server and DB server are on the same box, it will be slower. You would resort to lookup transformations and some so called advance stuff in SSIS to tune it.<br />
With SQL, you would have more control and best performance.<br />
Hope this helps. Thanks</p>
<p>Jagdish Malani</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on ETL Design Pattern : E-LT-L by Shamit Bhattacharya</title>
		<link>http://aditiblogs.com/blog/blog/2009/03/06/etl-design-pattern-e-lt-l/#comment-13971</link>
		<dc:creator>Shamit Bhattacharya</dc:creator>
		<pubDate>Tue, 22 Mar 2011 09:46:17 +0000</pubDate>
		<guid isPermaLink="false">http://aditiblogs.com/blog/blog/2009/03/06/etl-design-pattern-e-lt-l/#comment-13971</guid>
		<description>@Jagdish Malani
Hi jagdish,
Usually Staging area stores a days data or the data that ws extracted from source however was not loaded in target due to some failure.
In such case  if we keep the staging area on the target server, then we may not require transformation after load.
Secondly, would applying transformation after load be really a good option?
If your target table has millions of records and you start updating them even though in batch, it will definitely eat up you time.</description>
		<content:encoded><![CDATA[<p>@Jagdish Malani<br />
Hi jagdish,<br />
Usually Staging area stores a days data or the data that ws extracted from source however was not loaded in target due to some failure.<br />
In such case  if we keep the staging area on the target server, then we may not require transformation after load.<br />
Secondly, would applying transformation after load be really a good option?<br />
If your target table has millions of records and you start updating them even though in batch, it will definitely eat up you time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on ETL Design Pattern : E-LT-L by Ashish Tiwari</title>
		<link>http://aditiblogs.com/blog/blog/2009/03/06/etl-design-pattern-e-lt-l/#comment-13969</link>
		<dc:creator>Ashish Tiwari</dc:creator>
		<pubDate>Tue, 22 Mar 2011 08:45:06 +0000</pubDate>
		<guid isPermaLink="false">http://aditiblogs.com/blog/blog/2009/03/06/etl-design-pattern-e-lt-l/#comment-13969</guid>
		<description>Hi,

This sounds like a good idea however what will be the impact on the design of the data marts as they need to accommodate  all the leaf level and or staging type of tables. Data marts which are usually supplied with only clean data will contain the raw data as well since the data is directly loaded from source. Shall we still need a DW or DMs alone will suffice? Have you come across any other ETL desgin pattern than this?

Once again, appreciate your idea and would like to hear more from you.

Regards,
Ashish</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>This sounds like a good idea however what will be the impact on the design of the data marts as they need to accommodate  all the leaf level and or staging type of tables. Data marts which are usually supplied with only clean data will contain the raw data as well since the data is directly loaded from source. Shall we still need a DW or DMs alone will suffice? Have you come across any other ETL desgin pattern than this?</p>
<p>Once again, appreciate your idea and would like to hear more from you.</p>
<p>Regards,<br />
Ashish</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Metric – What Why and Case Study by Ashok NV</title>
		<link>http://aditiblogs.com/blog/blog/2010/04/27/metric-%e2%80%93-what-why-and-case-study/#comment-13576</link>
		<dc:creator>Ashok NV</dc:creator>
		<pubDate>Fri, 04 Mar 2011 06:12:38 +0000</pubDate>
		<guid isPermaLink="false">http://aditiblogs.com/blog/blog/2010/04/27/metric-%e2%80%93-what-why-and-case-study/#comment-13576</guid>
		<description>This a very good article and very helpful thank you</description>
		<content:encoded><![CDATA[<p>This a very good article and very helpful thank you</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Parallel Programming in C# 4.0 using Visual Studio 2010 by Leo</title>
		<link>http://aditiblogs.com/blog/blog/2009/06/07/parallel-programming-in-c-40-using-visual-studio-2010/#comment-12999</link>
		<dc:creator>Leo</dc:creator>
		<pubDate>Mon, 31 Jan 2011 18:45:24 +0000</pubDate>
		<guid isPermaLink="false">http://aditiblogs.com/blog/blog/2009/06/07/parallel-programming-in-c-40-using-visual-studio-2010/#comment-12999</guid>
		<description>Thanks for the simplify explanation...</description>
		<content:encoded><![CDATA[<p>Thanks for the simplify explanation&#8230;</p>
]]></content:encoded>
	</item>
</channel>
</rss>

