Archive

Archive for the ‘Architecture & Design’ Category

Why SaaS won’t be the software panacea

February 23rd, 2010

SaaS (Software as a Service) – refers to a model where a software provider provides an application to customer for use as a service, whenever wanted. Subscription to the service includes licensing, subscription, etc.

Since the application is developed, hosted and maintained by the provider, the customer does not bear the direct cost of hosting and maintenance. Also, since the service is available over the internet, it makes it easy for access from different locations. The deployment and time to profit is also reduced for smaller companies.

SaaS is a good option for small, non-complex companies. These processes allow these small companies or startups to focus on the more complex business and personal processes rather than the routine.

However, there are a number of cases where this model would not work either for the provider or the customer.

Data speeds: The data transfer happens over the internet and not over the ethernet speeds that most people are used to. This is not helpful for cases where you need to use the service frequently and for time critical processes.

Data security: This is a specific concern for all customers since it is their data (and potentially thousands of their customers’ data) that is present at the servers of some third-party. In addition, the data transfer happens over the internet. There are significant gains being made in this area, however, with the advances in technology, the risk still remains high.

Stability: The provider might go bust. What should a customer do if their service provider closes shop? How does he go about retrieving his data? What should a provider do when a customer goes missing? This impacts his revenue stream.

Customization: This is not recommended for applications or services that require a high degree of customization. This is true in cases of applications such as manufacturing, business intelligence or ERP, any applications that are at the code of a company’s business practices and provide the differentiation from the market. If a customer needs to pay for the customization, he might as well get a traditional enterprise application build for himself. If the provider pays for this, it might not turn out to be cost effective for him.

Integration: If my software needs to integrate with other software, SaaS proves to be a difficult model to work with. Since parts of my solutions are not with me, the customer cannot change or ask for changes easily. Since the product (under the SaaS model) is a complex one being used by multiple customers; the provider cannot change it easily either.

Active (in-process) applications: SaaS works well when a user needs to make a few disconnected calls to the service to send or receive data. If he needs the application to be online (where a long connection is needed to make continuous data entry), the remote nature of the service and the data access over the internet makes this model prohibitive.

So where does this leave us now?

I believe the answer lies somewhere in the future – Cloud Computing and thin clients.

Cloud computing in general provides a useful computing power over the internet. Could computing generally encapsulate these three items– infrastructure as a service, platform as a service and software as a service. It is the first two parts that will help solve the issues with the third part.

Thin clients are gaining more traction, especially in the mobile world, as the future of application or software development. The idea is to have a thin client that knows how to connect to the server and display the data or results. This relies more on the use of services (custom enterprise applications or third-party services) over the internet.

Some of the earlier mentioned issues might still be around with the new development and delivery models, but with the increase in the power of the internet and the computer itself, we will soon look at a different list of pros and cons.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Ravi Yadavalli Architecture & Design

ETL: How to handle bad data

April 7th, 2009

During any ETL design, we implement various functionalities like validation, auditing, notification, job recovery, job logging, data cleansing, handling bad data, etc. I am going to talk about handling bad data in this blog. At a top level, ETL design allows for bad data to be rejected and sent over to the appropriate users in the form of files. But, in my opinion, there is more to this than meets the eye. As an ETL architect, our responsibility does not end there. So before I get into details of how shall we handle bad data, let me tell you how what makes the incoming data bad enough to be handled. Following are the few reasons that generate bad data:

  1. A set of business rules are laid out that define whether the incoming data is good or bad. Let’s consider the sales record where the cost of the product must be present. If the cost contains a null value or a negative number, the sales records would be considered bad.
  2. Any data that would not satisfy the referential integrity in data warehouse database. This usually happen in case of missing inter-dependent data. If the incoming data contains references to some other data, which could not be loaded for some reason, this incoming data becomes bad and shall not be loaded into data warehouse database. A typical example would be a retail chain maintaining product master in a centralized database. And the sales data is generated across different POS terminals. So during ETL, if corporate database is down for whichever reason, ETL would not be able to load new products. But at the same time, there is sales generated for these new products. Thus with no product in product dimension (master) the sales record is considered bad at this moment and would not be loaded into data warehouse database.
  3. Missing business keys: If mandatory data is missing in the incoming data then that data is considered bad. This usually does not happen if the data is sourced from other relational databases. While sourcing data from files, there is every possibility that data may go missing, even if the format of the incoming feed file is already decided.
  4. Missing data: There may be many cases when the data is missing in the incoming data. That logically makes the data bad. For example an incoming product feed file contains record with no product code and product description. In this case the data is considered invalid.

Now, as we have seen what generates the bad data, we must understand that data cleansing does not make any data bad. Data cleansing is applied only on good data. So before we start data cleansing, a mechanism needs to be put in place to identify good data from the bad data.

Once this bad data is identified, it is usually stored in a separate area called “Rejection Area”. This rejection area can be in a separate schema in the same database that contains staging schema or it can be a separated database altogether. The structure of the rejection area (table structure) is similar to that of staging area with the additional few columns. These additional are required to store metadata about rejected data.

As an ETL architect we need to design our ETL to provide following functionality:

  1. Ability to reprocess this bad data whenever required. The data that could not be loaded due to missing references is usually re-processed when that missing data is loaded into the data warehouse database. Consider the case where due to missing product master, sales data was rejected. Later, when the latest product master is loaded, this bad data (which wasn’t really bad) which could not be loaded needs to be reprocessed. Otherwise the sales summary report would not be accurate. Another factor when the bad data needs to be reprocessed is change in business rules. If there is a lot of rejection due to strict validation rules, customer may decide to change these validation rules so that large amount of data is not rejected.

    We can automate this functionality by adding few columns in the rejection area table:

    1. Reprocess_flag (Y|N): This flag when set, determines if this record needs to be re-processed. Moving these rejected records from rejection area to staging area should be automated. This would help customers in various ways such as reduced the dependency on IT staff, lower maintenance cost.
    2. Reprocess_Job_Id: Usually the metadata about each run of job is maintained. So to be able to audit, when the rejected record was reprocessed, the job id is stamped into this column.
    3. Active_Flag: Once the record marked to be reprocessed, is copied over to staging area, this record is made inactive as this record is not valid anymore. It may happen that this record may fail validations again and end up in rejection area, but it would be considered as another record. So in essence, there is only one instance of the rejected record in rejected area. This implies that inactive records cannot be chosen to be re-processed.
  2. Ability to reprocess incoming data: Many times due to various reasons, the same data that is already processed is fed again into the ETL. This requires us to identify the corresponding records in rejection area and mark them as inactive as these records are not valid anymore. The incoming records would need to be validated as per the current business rules. The records in staging area are compared with the records in rejection area against the business keys and for matching records active_flag is set to ‘N’. This process of marking existing rejected records as inactive is usually automated.
  3. Ability to mark invalid data: Sometimes the business keys in the incoming records are null. These records are then eventually end up in the rejection area and are active. These records, no matter how many times are reprocessed, they would end up in rejection area again. At the same time, the incoming records can be matched with these records. Thus these records should be marked as invalid. For this add following column in the rejection area table:
    1. Valid_flag (Y|N): This flag must be set to ‘Y’ for missing business keys. This implies that this record cannot be re-processed ever.

    The important fact to be noticed here is that the responsibility of ETL architect does not end here. By designing the ETL, to mark the rejected data as invalid does not solve any business problems. The incoming data must get loaded into the data warehouse database. So it becomes very important for BI architect to talk to end users and tell them the impact of this. The end users may need some tweaking in the source systems, but if they need accurate reports they must send the accurate data.

What I have explained is just one way of designing rejection area (tables containing bad data). What I have discussed is the concept of handling bad data. There can be different ways of implementing the above mentioned functionality. Once ETL and database is designed appropriately, an interface must be provided to the end users that allow users to do following:

  1. Select any rejected table and mark rejected records to be reprocessed.
  2. Select any already processed data for reprocessing. This is simple if the incoming data is in the form of feed files. This gets little bit complex when the data is extracted from existing databases. Typically in large ETL systems, the staging area is archived. Depending on the needs of the customer, this goal can be achieved.
  3. Look at the invalid data and analyze it to be able to fix the source systems accordingly.
  4. Execute the ETL job after selecting the rejected records or selected source data for reprocessing. This would depend on various other factors such as ETL time window, the need to reflect the correct data, the time of ETL run, etc.

Last but not the least, as an ETL architect our goal shall not be just to implement some logic to handle bad data. Our main responsibility is to make this whole thing as automated as possible. Automation would provide various benefits such as reduced development time, lesser errors hence increased quality and finally the reduced cost for the customer.

If you have any questions or comments, you can reach me at jagdishm@aditi.com.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Jagdish Malani Architecture & Design, BI - Business Intelligence , , , , ,

Future of ETL - Metadata driven

March 26th, 2009

Some time back, I was called to design ETL for a mid-sized enterprise. I had to deal with various issues like multiple data sources, not-so-clean data, various data validations, data cleansing needs, ETL time window within which the ETL job shall finish, changing requirements, and the list goes on. On the top of that I had very tight timelines. Then with the experience in my previous BI projects, wherein I dealt with ETL with different tools (BO, SSIS), I came up with a list of goals, which I thought shall be there in any standard ETL implementation and must be implemented here too. Here are some of the main goals:

  • Performance
  • Flexible enough to handle changing requirements
  • Least maintenance
  • Automation
  • Modularizing processing
  • Data quality
  • Recoverability of ETL job
  • Auditing
  • Logging
  • Notification
  • Ability to reprocess rejected records, files, or any other data source
  • Remote administration of ETL job

While going though this list, I realized that all these features were there in my earlier ETL implementations too. This fact led me to a list of questions like “How I am going to improvise on my earlier designs”, “How can we take ETL to a new level”. Before I set out to find answers to these questions, I asked myself a simple question “Why”. Why shall I improvise my ETL design? What is the need of this at the first place? I found the answers early. Surprisingly the answers were “Reduced development time, with increase quality results in reduced cost for customers”. This can be a good selling point for sales team too, that I would cover in my next blog.

So coming back to original questions as how to improvise ETL design, I started with comparing different ETL tools that I have worked with in the past, looked at how these tools work, how these tools handle metadata, how they eventually run the job. Interestingly, I found that these tools are good enough to meet standard ETL requirements. But these tools do not make good use of metadata. Even if they use metadata at few places, there is no intuitive interface available. Thus I got my first lead and I decided to make use of metadata as much as possible. Hence, quest for “Metadata driven ETL” starts….

Before I tell you what Metadata driven ETL is, let me tell you “what” it is not. It does not generate the ETL program on the fly as you may expect. Instead metadata driven ETL uses an existing program that executes various tasks by dynamically reading parameters from metadata datastore. This datastore can be an xml file, a separate database repository, few tables in DW database.

WHAT IS METADATA: Any data that is used to execute a transform/operation is called metadata. For e.g. from transformation perspective, in slowly changing dimension transformation type II, the target columns that are considered as business keys is nothing else but metadata. Similarly from operations perspective, the list of database tables that must be cleared before the extraction starts, defines metadata. When we use ETL tools’ standard transforms, or hand-code ETL, we use metadata in one way or the other.

USING METADATA:
How shall we use this metadata to our advantage? The best approach to start with metadata driven ETL is to:

  1. Identify different operations in ETL that can be automated. There are certain operations that are carried out during ETL which can be automated. These operations may involve multiple tables. Usually these operations are put into some script or SQL and are executed as one operation. Let’s look at an operation where we clear clearing staging tables before the extraction starts. Usually staging tables are designed as follows:
    1. Create all staging tables in a separate database
    2. If staging area is same as data mart, create a specific schema like “STAGING” or prefixed the table names with something as “STG_”

Whichever way we go, this list of tables can be read from database system tables (metadata). Using this metadata, this operation can be automated. On the similar lines, we can create specific metadata to be used for other ETL operations. This would make the ETL development easier and faster and lesser probabilities of error. Here are the other sample operations that can be made metadata driven:

  • Moving rejected records to rejected area
  • Reprocessing rejected records
  • Data cleansing: For all incoming data, based on unique keys identified, duplicate records need to be deleted. For this, there isn’t an in-built transform in ETL tools. This can be made metadata driven. ETL would read unique keys for each incoming business entity and dynamically creates and executes it. Thus, developer would just need to specify the unique keys in metadata datastore. This again would bring down the development effort.
  • Notification: Upon completion of an ETL job, it can read metadata (from address, recipients addresses, mail server, subject, body) and this process can be made metadata driven. We all do this already. What is different here is how we implement it.
  • Auditing and logging
  • Identify transformations that can be made metadata driven: There are lots of transformations that can be made metadata driven. We need to identify those transforms and make it metadata driven. For example, let’s see the slowly changing dimension type II. This transformation is heavily used in any ETL program. In this transform, incoming records are compared for pre-defined key columns to the existing records in data mart. If the records exist in data mart they are updated, otherwise they are inserted. For every SCD II, the transform is to be coded. This transformation needs following metadata:
    • Source and target table names.
    • Business keys column names in source table.
    • Key columns in target table.

With the tools like BO or SSIS, this transform is kind of hardcoded. To make this transform metadata driven, read the above mentioned metadata and dynamically create “Merge SQL statement” (available in DB2, SQL Server 2008) in a stored procedure. This stored procedure would be called from the ETL at appropriate places, for all SCD-II transforms. This metadata driven SCD-II transform would give you the best performance. Let’s consider that the staging area is within the data warehouse database or in a separate database on the same server. If some ETL tool’s transform is used, the data is processed in batches of some pre-defined size. ETL engine would apply this transform and fire appropriate SQLs for every incoming row in the batch which is definitely slower. However in metadata driven transform which is a SQL operation, it processes all the data in bulk and we get increased performance.

IMPLEMENTING METADATA: When we use any ETL tool for transformations or for other purposes, we are in fact using metadata. The difference lies in implementation and the usage of metadata. How and where this metadata is stored? How do we access this metadata at run time? Is there any single interface available to access metadata? When we use ETL tool for some transformation, this metadata is stored in the proprietary format. We cannot simply go and change it directly. We need to use the ETL tool designer to access it. Thus, after the ETL program is deployed, if there is a change in some transform, we need to make the change through ETL tool designer and re-deploy the ETL program. So if we design the ETL in a manner wherein we use our own metadata in our own way, we would end up with a very good ETL architecture. This ETL architecture would eventually evolve into a framework. This framework can be reused across multiple ETL implementations which bring down the estimates and thus the cost significantly. This would give an added advantage to us over our competitors.

With this approach, you would get following advantages in metadata driven ETL:

  • Increase performance
  • Reduced development time
  • Lesser errors
  • Easier and least maintenance which translates into cost savings for customers in long term (added value)
  • Once the basic shell of metadata driven ETL is created, it would reduce the learning curve for the team members.

Then, I started with this mindset, and I was successful enough. Now I am in the process of refining the ETL architecture.

AT LASTL, at the same time creating metadata driven ETL is in no way suggests us to get away from the ETL tools. ETL tools have their own strengths. We would actually be architecting ETL in a different way to make it metadata driven. Metadata driven ETL pays rich dividends during maintenance phase also as it takes lesser effort and allows for quick deployment.

LOOKING FORWARD TO: Some of the above mentioned metadata is already captured in ETL tools available today, but still they need to evolve more. So until, the ETL tools support 100% metadata driven development, we as architects, shall design our ETL in a certain way to fill in the gap. I do hope commercial ETL tools vendors would be working on this and we will soon have next generation of ETL tools.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Jagdish Malani Architecture & Design, BI - Business Intelligence , , , ,

Mobile Everywhere

March 7th, 2009

Mobile Application Architecture Series – Design Considerations

Mobile applications inherit various challenges which need to be considered while designing on any platform (Microsoft / Symbian / OS X etc). The following section tries to address various design guidelines for a mobile application development. Technology stack considerations would be the next in this series of blogs.

A mobile application will normally be a multi-layered application comprising user experience, business and data layers. When developing a mobile application, you may choose to develop a Thin-Client or Rich-Client. If you are building a rich client, the business and data layers will be located on the device itself. In Thin-Client, the business and data layers will be located on the server.

Design Considerations:

Follow these design guidelines to ensure that the application meets your requirements and performs efficiently in mobile world.

1)      Thin-Client or Rich-Client or RIA.  If your application requires local processing and must work in an occasionally connected scenario, consider designing a Smart (thin) client. But, a Rich-Client application will be more complex to install and maintain. If the application can depend on server processing and will always be fully connected, consider designing a Thin-Client. If your application has a rich user interface, only limited access to local resources, and portability to other platforms is required, then RIA would be a good choice (eg. ETrade).

2)      Device Types to Support: While choosing which device types to support, consider screen size, screen resolution (DPI), CPU performance characteristics, available memory and storage space (inbuilt & external) and Integrated Development Tool availability. On top of the above, we need to consider the User Requirements and Organizational Constraints / Guidelines. Any additional components such as GPS and Integrated Camera may influence your choice of application type and device type.

3)      Connectivity: The real time connectivity requirement between the mobile device and Gateway would highly influence the decision of whether to go for Thin-Client or Rich-Client or RIA. If an application requires an intermittent network connection, our design approach should properly handle caching, state management and data access mechanisms.

4)      UI design considerations: Mobile devices requires simple UI in order to work within the constraints imposed by the device hardware such as memory, battery life, different screen sizes and their orientations and network bandwidth.

5)      Layered Architecture: Depending on the application type, multiple layers may be located on the device. Use the concept of Layers to maximize the separation of concerns, and to improve reuse and maintainability for your mobile application. However, aim to achieve the smallest footprint on the device by simplifying your design compared to a desktop or web application.

6)      Security: Designing an effective authentication and authorization strategy is important for the security and reliability of your application. Mobile devices are usually designed to be single-user devices and normally lack basic user profile and security tracking beyond just a single password. Mobile applications can also be especially challenging due to connectivity interruptions.

7)      Caching: Use caching to improve the performance and responsiveness of your application, and to support operation when there is no network connection. Use caching to optimize reference data lookups, and to avoid network round trips. When deciding what data to cache, consider the limited resources of the device; you will have less memory available than a PC.

8)       Communication: Device communication includes wireless communication (over the air) and wired communication with a host PC, as well as more specialized communication such as Bluetooth or Infrared Data Association (IDA).  When communicating over the air, consider data security to protect sensitive data from theft or tampering.

9)      Performance Considerations:

a.       Design configurable options to allow the maximum use of device capabilities. Allow users to turn off features they do not require in order to save power.

b.      To optimize for device resource constraints, consider using lazy initialization.

c.       Optimize the application to use the minimum amount of memory. When memory is low, the system may release cached intermediate language (IL) code to reduce its own memory footprint, return to interpreted mode, and thus slow overall execution.

d.      Consider using programming shortcuts as opposed to following pure programming best practices that can inflate the code size and memory consumption. This decision should be a thoughtful one as it may contradict the design principles of OOPS and maintainability.

e.      Consider power consumption when using the device CPU, wireless communication, or screen while on battery power. We should balance power consumption with performance.

In my view, the above listed considerations are only a subset of complete list, but tried to address the key areas in it. One basic design approach for most of the mobile applications is “avoid BDUF”, Big Design Up Front, which recommends design evolving over time and avoid making a large design effort prematurely.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Sreenivasa Rao Pilla Architecture & Design, Mobile Architectures , , ,

ETL Design Pattern : E-LT-L

March 6th, 2009

I was ramping up C# to create BI Framework and I hit upon the term “Design Patterns”. I decided to go through few patterns only, as I did not have patience to complete the book, also I assumed I don’t need it. Finally at the end of day, I was able to implement a couple of patterns in my BI Framework application. During this time, I wondered whether there are any pre-defined design patterns in Data Warehouses, ETL, Cubes, and Universes. My quest begins … In this blog, I will focus on ETL alone. 

I started looking back at my projects to analyze if I had used any design patterns or whether I hit upon the recurring problems that could have been solved with alternate designs. What I found was quite interesting. I have been using best practices defined for each component all along. Then why were we slogging all the time. What were the problems we were facing every day? Now the right time! I decided to plunge deep in each component to find out exactly what I could have done differently during design which could have made my life easier. And that is where I came up with an ETL design pattern E-LT-L.  

Before I explain what this E-LT-L means, let’s look at ETL first. ETL stands for Extraction, Transformation and Loading. In ETL we extract data from multiple data sources and transform the incoming data in a format compatible with data warehouse structure, followed by loading into data marts. Usually companies use some commercial tools like SSIS, BO Data Integrator Designer, etc… The developers then make full use of these tools and they end up using most of the functionality provided in these tools. And to an extent, this looks right too, as the companies have made an investment for this purpose.  

Finally when companies, evaluate their ROI, the results are amazing. By following industry standards and using commercial ETL tools, training their development teams, the results do not look good. A new set of problems like performance problems, steep learning curves, fixing the parts that are not broken, buying additional hardware, etc. have come up.  From ETL perspective, even though, most companies have a designated job server, they do not get a good performance. After a while in production, when there data marts grow in size or the size of incoming data increases, the performance of ETL job takes a hit. To resolve these issues, many things are done like increasing configuration of job server, make changes in database structure, use bulk loading options (favorite choice for techies), split the jobs, pushing few things like summarization over  to weekends, etc.. This results in making the system more complex than it should be which has a direct hit on overall IT cost.    

So exactly what went wrong?  

This is where a new pattern E-LT-L comes into existence. Most of the recurring problems can be resolved using this design pattern.  

E-LT-L stands for Extraction, Loading and Transformation, and Loading. This basically suggests that, once the data is extracted,  instead of applying transformations (T) in the staging area, load this data into data mart; and then apply transformations there (LT).  Since, incoming (raw) data is in data mart already, it would make more sense to use database objects (stored procedures) for transformations instead of row-based transformations available in ETL tools. Using database based transformations would resolve most of the problems. 

Effectively, this pattern calls for BULK loading and transformations at correct place without moving huge amount of data around.  

This may sound strange and many people would agree to debate. To prove the point, I would present few scenarios and let you decide what is good for your implementation.  

Scenario 1: Let’s assume you have 1 million incoming records that need to do look up for say customers. No matter what tool you use and how much you configure it, it has to run some sql against customer master, which is typically huge in may DW installations, to get the customer code. This will become a bottleneck as the sql would be executed multiple times. Also to add to the woes, the “customer code” column value has to travel from database server to job server and is stored in the placeholder (variable) in incoming row. Instead if you code the lookup transformations, in stored procedure, with one sql you can update all the rows by simply joining the staging table with customer master table. By all counts, the performance of this sql cannot be beaten.  

Scenario 2: We have incoming product master for a large retail chain. We need to implement slowly changing dimension type 2 here (insert new records and update existing records if already present. Any ETL tool would implement this transformation row by row and it can get painfully slow. This transform can be easily developed in stored procedure using MERGE sql statement. For more experienced developers this can be made meta data driven.  

To summarize, the transformations done using stored procedures help as follows:

  • It does not move data between database servers and job servers. We get a good boost in performance.
  • The data is processed in bulk.
  • Database engine (Oracle,SQL Server) is designed to do this job in the most efficient manner than any third party ETL tool can do.
  • It is always easy to fix/debug the stored procedure than some transform in ETL tool.
  • Small learning curve for developers to ramp up on the ETL tools.
  • Deployment of ETL packages (which is surely a problem in ETL tools). It takes a lot of effort.

I am sure, many of you would be thinking then why we shall use any commercial tool for ETL at all. These tools have their own significance. We would need a commercial ETL tool to achieve other goals of ETL, such as:

  • Defining a workflow where we decide upon what tasks shall execute and in what order. What to do if some task fails.
  • Executing tasks in parallel. 
  • Even though a stored procedure is created to transform some data, it needs to be called from an appropriate place in ETL.
  • Use the logging/notifications capabilities of ETL tool, which usually are very efficient and simple. 
  • Use the scheduling and other features readily available in commercial ETL tools 

What this E-LT-L pattern suggests is the design of ETL architecture. By having good ETL tool and good design of ETL, we can write efficient and manageable jobs. 

If you have any questions or comments, you can reach me at jagdishm@aditi.com.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Jagdish Malani Architecture & Design, BI - Business Intelligence, Performance ,

Architecture - From Mind to Paper !

December 15th, 2008

Everytime I sit to document an architecture, I am confused and take some time to start off. This time I sat to figure out why I was doing this everytime though I had done it N times before. The answer I got was that I was pondering about the target audience, the representation to be used etc. Finally I think I have the answer - “Don’t worry about it”�

When you start documenting the architecture, consider both technical and non-technical audience into consideration.  The first 1-2 sections of the document should address the non-technical group. Rest of it can be hard-core technical terms/concepts.  Now you may ask me why the non-technical group should be considered. This group consists of senior level managers, program managers etc who would like to validate if all their requirements have been met. Once they give a go-ahead, the tech team takes over to inspect the thought process under a microscope.

So here is the structure I found and liked:
1. Target audience for the document
2. Terminologies used
3. References - (Mainly your requirements/functional specification). If you are integrating with any external systems, refer to their architecture/design documents.
4. Non-functional requirements - Read through the requirements document and extract the non-functional requirements such as security, performance etc and list it here.
5. SAD structure (SAD - System Architecture Document)
6. Conceptual Architecture
7. Logical Architecture
8. Execution Architecture
9. Architecture Validation

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Ashwin Mallur Parthasarathy Architecture & Design

MVC Architechture in ASP.Net-A Design Approach

February 27th, 2008

MVC architecture consists of Model, View and the Controller. Model is the common app logic which will insert, update, delete and modify data into/from suitable data source.View would be the GUI.Controller is the heart, is what we will discuss upon.

When developing complex dynamic ASP.NET applications, it’s important to minimize code duplication to increase the reusability, scalability and flexibility of the application. In some applications, users may perform many different actions that have various controller logic but result in the same view (UI).

In order to develop reusable logic we have to minimizing the amount of server side code in the code behind pages as they pertain to a single object i.e. they are the controller for a single page. The logic on code behind and scripted pages is difficult or impossible to reuse and results in poor separation between the view and the controller. The active server scriplets are also more difficult to test and debug, additionally they defy the basic ASP.NET feature of pre-compiling of code as they are compiled and translated at run. Instead of adding script code to an .aspx page, it’s more efficient to implement the controller using classes, which allows you to implement common appearance and navigation across your Web application and reuse presentation logic throughout the application.

There are two different patterns that address the implementation of controller classes for ASP.NET applications. The Page Controller assists you in building an application in which the navigation patterns are static but the pages are generated dynamically. For more complex applications where the navigation is dynamic or configurable based on a set of rules (e.g., user privileges or application state), the Front Controller allows for a more efficient implementation.

They are as follows:

Page controller:
Similar to Default asp.net controller .This would have a base page controller which will act like a master controller, will have all the events contained by an aspx.cs code behind page, but will contain logic for security verification , session managements,

The Front Controller pattern:
The Page Controller pattern becomes inefficient when you need to coordinate processing across multiple Web pages because of its implementation of a single object per logical page. The Front Controller is more efficient in such cases because it funnels all requests through a single controller and then directs requests through a single handler and a hierarchy of command classes. The handler retrieves parameters from the HTTP request, chooses the correct command, and transfers processing to it. After each command object performs the specified action, it can choose which view is required to render the page properly. Implementing the Front Controller results in more centralized application control because all page requests come through a single controller instead of being handled by different Page Controllers. But this can also be a liability if the handler does expensive processing, such as database lookups that could cause the entire application to operate slowly. The handler should be as efficient as possible and use external resources only when absolutely necessary. You should also consider caching any external resources to increase the handler’s performance.

You implement the Front Controller class by creating a Handler and a ActionFactory, which determines the necessary action using the request parameters (like action class name, query strings…etc) to execute in response to a request. ASP.NET provides the IHttpHandler interface to create custom interfaces required to service incoming HTTP requests.The Handler class would be implemented by inheriting from System.Web.IHttpHandler and adding the logic to instantiate and call the appropriate action from the ActionFactory. The ActionFactory defines a collection of actions and the logic that determines which of the actions should be executed. Calling the HandlerFactory returns the appropriate action object for which the Handler can call an Execute method (which will be the default method i.e if the action does not specify any method then Execute method would be called). Using this pattern, you can create more robust navigation scenarios and implement them centrally by extending the ActionFactory logic and creating additional actions to handle the required scenarios.
The type of action would be mapped with an Xml file. So every aspx page would have an action however multiple pages can have same action (reusability).
But this mapping would involve reflection, hence a performance issue (reflection being very expensive).

Handler/ HandlerFactory:
This module would be responsible for handling all requests through the pipe line and process it accordingly. Basically this would consist of a Handler class which will be inheriting from IHttpHandler interface. The ‘Process Request’ would be the method which will interpret a request and generate a suitable response. You will be rewriting/modifying the request through Http Context in this class.
This module would also take care of session validation and other user validation.

Action Factory:
This module would create/re-use action instances which will be retrieved by the request. These action instances would be cached. If an action does not figure in the cache then it will be created.
This module would map the request to generate an action instance. This would be done as a mapping through a xml file. The xml would be a config file which would house all action mappings.
Eg.
For a request

“\\SomeReleventUrl.aditi?action=releventaction&method=releventmethodname”
".aditi" - would invoke the respective handler class.

"releventaction" - would map to the xml config file and specify the action instance.

"Releventmethodname" - would specify the method to be invoked in the class. If not specified would take the default method.

Mapping in xml file:


<actionfactory>
<action name=” releventaction” type=”FullyQualifiedAcitonClassName” actionifSucess=”SomeReleventSuccessUrl” actionifError=”SomeReleventErrorUrl” >
</action>
<action name=” releventaction_one” type=”FullyQualifiedAciton_OneClassName”
actionifSucess=”SomeReleventSuccessUrl” actionifError=”SomeReleventErrorUrl” >
</action>
<action name=” releventaction_twotwo” type=”FullyQualifiedAciton_TwoClassName”
actionifSucess=”SomeReleventSuccessUrl” actionifError=”SomeReleventErrorUrl” >
</action>
<action name=” releventaction_three” type=”FullyQualifiedAciton_ThreeClassName”
actionifSucess=”SomeReleventSuccessUrl” actionifError=”SomeReleventErrorUrl” >
</action>
</actionfactory>

The type attribute would give the reference of the action class to be instantiated, which would bring reflection into play (hence a performance issue).But looking at the overall flexibility and scalability this can be tolerated.
actionifSucess attribute would specify the url to navigate in case the action returns as success.
actionifError attribute would specify the url to navigate in case the action returns an error or failed validation..

Using reflection we will invoke the Releventmethodname method in the action instance.
This would be the basic process flow for the entire incoming request.

Action: The action will interact with the ‘Model’ layer and will do all kind of data processing and logic implementation. This module would also interpret/recognize user inputs and translate into more generic data sources. All the user entered values would be retrieved though the request objects. The action would take context object as the parameter as it house both request and response objects.

Entity Objects: These can be either xml based objects or generic classes. These entity objects would store data and would be transferred across different layers.

State Management: Since performance is an issue, use of session would be discouraged unless of high necessity.
Caching to be implemented wherever possible.
ViewState to be made false by default for all the controls.
HttpContext would be used extensively to store value spanning through a single post cycle.

Code snippets:
Handler class
public class Handler : System.Web.IHttpHandler
{
///
/// Will process all request and send appropriate response
///
///
public void ProcessRequest(HttpContext context)
{
//Create an instance of Action Factory class
//Invoke CreateInstance of the Action Factory Class
//The Action Factory would return a URL as the response
//Navigate to the New URL
}
///
/// Whether the handler can be re used
///
public bool IsReusable
{
get
{
return true;
}
}
public Handler()
{
//
// TODO: Add constructor logic here
//
}
}
}

Action Factory
[Serializable]
public class ActionFactory
{
///
/// Translate the request parameter to intanstiate the action
/// and invoke the suitable method
///
///
public void CreateAcionInstance(HttpContext context)
{
//Translate the request params
//Use reflection to create an instance of action class
//Use reflection to invoke suitable method.
//Action class would return success or failure value .
//Translate the returned value as the new Url
//Transfering the new URl to Handler class
}
}

Action

[Serializable]
public class Action:IBaseActionClass
{
///
/// Translate the request parameter to intanstiate the action
/// and invoke the suitable method
///
///
public void ExecuteAcion(HttpContext context)
{
//Translate user inputs through request forms containted in //Httpcontext object
//Make Functions calls

//Use reflection to invoke suitable method.

//Action class would return success or failure as response.
}

}

IBaseActionClass would be an interface all action class would be inheriting from. Necessary for runtime castins of all the actions

Further logic and DB calls would be done in the “Model” layer.

The MVC architecture is released with framework 3.0, which will be integrated part of it.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • description
  • LinkedIn
  • Live
  • MySpace
  • Slashdot
  • Technorati
  • TwitThis
  • description
  • E-mail this story to a friend!
  • Print this article!

Tanmoy Biswas ASP.NET, Architecture & Design