Tuesday, 31 March 2015

Different types of Variables

Different types of Variables

What is the difference between nominal, ordinal and scale?

In SPSS, we can specify the level of measurement as scale (numeric data on an interval or ratio scale), ordinal, or nominal. Nominal and ordinal data can be either string alphanumeric) or numeric. But what is the difference?

Nominal or categorical, ordinal and interval

In terms of variables, they are described as categorical (or sometimes nominal), or ordinal, or interval. Let’s see the definition and look why they are important.

Categorical or Nominal

A categorical variable (sometimes called a nominal variable) is one that has two or more categories, but there is no intrinsic ordering to the categories. Examples are given below-

a. Gender is a categorical variable having two categories (male and female) and there is no intrinsic ordering to the categories.

b. Hair color is also a categorical variable having a number of categories (blonde, brown, brunette, red, etc.) and again, there is no agreed way to order these from highest to lowest.

c. Examples of nominal variables include region, zip code, or religious affiliation

A purely categorical variable is one that simply allows you to assign categories but you cannot clearly order the variables.

Ordinal

An ordinal variable is similar to a categorical variable. The difference between the two is that there is a clear ordering of the variables. Examples are given below:-

a. Economic status, with three categories (low, medium and high). We can classify people into these three categories, also can order the categories as low, medium and high.

b. Educational experience (with values such as elementary school graduate, high school graduate, some college and college graduate). These also can be ordered as elementary school, high school, some college, and college graduate.

c. Examples of ordinal variables also include attitude scores representing degree of satisfaction or confidence and preference rating scores.

Interval or scale

An interval variable is similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced. Examples are given below:-

a. Annual income is a variable that is measured in dollars, and we have three people who make $10,000, $15,000 and $20,000. The second person makes $5,000 more than the first person and $5,000 less than the third person, and the size of these intervals is the same. If there were two other people who make $90,000 and $95,000, the size of that interval between these two people is also the same ($5,000).

b. Examples of scale variables also include age in years.

Why does it matter whether a variable is categorical, ordinal or interval?

Statistical computations and analyses assume that the variables have a specific levels of measurement. For example, it would not make sense to compute an average hair color. An average of a categorical variable does not make much sense because there is no intrinsic ordering of the levels of the categories. Moreover, if you tried to compute the average of educational experience as defined in the ordinal section above, you would also obtain a nonsensical result. Because the spacing between the four levels of educational experience is very uneven, the meaning of this average would be very questionable. In short, an average requires a variable to be interval. Sometimes you have variables that are "in between" ordinal and interval, for example, a five-point Likert scale with values "strongly agree", "agree", "neutral", "disagree" and "strongly disagree". If we cannot be sure that the intervals between each of these five values are the same, then we would not be able to say that this is an interval variable, but we would say that it is an ordinal variable. However, in order to be able to use statistics that assume the variable is interval, we will assume that the intervals are equally spaced.

Trends in Business Analytics

While, in the past, business analytics were nice to have, these days, it's a necessity, and the biggest change is the use of business analytics in day-to-day decision making processes

Analytics Usage Examples

Fraud detection
Web display advertising (can now be changed immediately based on purchases)
Call center optimization
Social media and social networking analysis
Sentiment is viewed in lieu of just data

Analytics Applications

Intelligent traffic management
Smart power grids
Sustainability
Bioinformatics.

Analytics Challenges

Data deluge – big data
Smarter analytics –advanced analytics (reactive and predictive)
Faster decisions – real time analytics
Faster Time to Value – Pressure on IT to reduce costs

He gives a historical perspective of the data deluge. More and more data is available, and systems have become universal. We've gone from processing Megabytes and Gigabytes of information to processing Petabytes.

Smarter Analytics

Advanced statistics
Predictive modeling and analytics
Web event analytics
Text and social media analytics
Social network analysis

In summary, analytics usage is changing, and companies need to stay abreast of the changes in order to make smart, immediate decisions at a lower cost

Please go through the link where Colin White, founder of BI Research talks about the difference between business analytics of the past and business analytics of the future. He gives examples of how the 'new' analytics can be used, applications, and challenges that are coming up in analytics now.

https://www.youtube.com/watch?v=nfMnILQVZXo

10 Popular Analytic Tools in Business

Business analytics is a fast growing field and there are many tools available in the market to serve the needs of organizations. The range of analytical software goes from relatively simple statistical tools in spreadsheets (ex-MS Excel) to statistical software packages (ex-KXEN, Statistica) to sophisticated business intelligence suites (ex-SAS, Oracle, SAP, IBM among the big players). Open source tools like R and Weka are also gaining popularity. Besides these, companies develop in-house tools designed for specific purposes.

Commercial software

MS Excel: Almost every business user has access to MS Office suite and Excel. Excel is an excellent reporting and dash boarding tool. For most business projects, even if you run the heavy statistical analysis on different software but you will still end up using Excel for the reporting and presentation of results. While most people are aware of its excellent reporting and graphing abilities, excel can be a powerful analytic tool in the hands of an experienced user. Latest versions of Excel can handle tables with up to 1 million rows making it a powerful yet versatile tool.

SAS: SAS is the 5000 pound gorilla of the analytics world and claims to be the largest independent vendor in the business intelligence market. It is the most commonly used software in the Indian analytics market despite its monopolistic pricing. SAS software has wide ranging capabilities from data management to advanced analytics.

SPSS Modeler (Clementine): SPSS Modeler is a data mining software tool by SPSS Inc., an IBM company. It was originally named SPSS Clementine. This tool has an intuitive GUI and its point-and-click modelling capabilities are very comprehensive.

Statistica: is a statistics and analytics software package developed by StatSoft. It provides data analysis, data management, data mining, and data visualization procedures. Statistica supports a wide variety of analytic techniques and is capable of meeting most needs of the business users. The GUI is not the most user-friendly and it may take a little more time to learn than some tools but it is a competitively priced product that is value for money.

Salford systems: provides a host of predictive analytics and data mining tools for businesses. The company specialises in classification and regression tree algorithms. Its MARS algorithm was originally developed by world-renowned Stanford statistician and physicist, Jerome Friedman. The software is easy to use and learn.

KXEN: is one of the few companies that is driving automated analytics. Their products, largely based on algorithms developed by the Russian mathematician Vladimir Vapnik, are easy to use, fast and can work with large amounts of data. Some users may not like the fact that KXEN works like a ‘black box’ and in most cases, it is difficult to understand and explain the results.

Angoss: Like Salford systems, Angoss has developed its products around classification and regression decision tree algorithms. The advantage of this is that the tools are easy to learn and use, and the results easy to understand and explain. The GUI is very user friendly, a lot of features have been added over the years to make this a powerful tool.

MATLAB: is a statistical computing software developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms and creation of user interfaces. There are many add-on toolboxes that extend MATLAB to specific areas of functionality, such as statistics, finance, image processing, bioinformatics, etc. Matlab is not a free software.

Open Source Software

R: R is a programming language and software environment for statistical computing and graphics. The R language is an open source tool and is widely used by the academia. For business users, the programming language does represent a hurdle. However, there are many GUIs available that can sit on R and enhance its user-friendliness.

Weka: Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software, developed at the University of Waikato, New Zealand. Weka, along with R, is amongst the most popular open source software used by the business community. The software is written in the Java language and contains a GUI for interacting with data files and producing visual results and graphs.

Monday, 30 March 2015

Why Business Analytics

Asking why is a good way to remind yourself what matters, and where your focus should be. Let’s see why we need analytics in our businesses to keep on track

Why you need analytics?

1. To measure and track your results across time

2. To understand your visitors, leads, prospect

3. To understand, track and improve the mechanisms used to convert your first visitor into a valuable customers.

Business analytics can be defined as the broad use of data and quantitative analysis for decision-making within organizations. It encompasses query and reporting, but also aspires to greater levels of mathematical sophistication. It includes analytics, of course, but let’s assume that the term “business” involves harnessing the analytics to meet defined business objectives. Business analytics enables people in an organization to make better decisions, improve processes and achieve desired outcomes. It brings together the best of data management, analytic methods, and the presentation of results.

Retail Analytics

Retail Analytics

Retail is one of the fastest growing industries. New trends are emerging and competition is increasing, especially in the online market. Consumers are much more tech savvy and are highly involved in the online world. This new online trend is changing the retailer-customer relationship, placing customers in control.

Consumers are constantly sharing ideas and opinions with friends and the general public through numerous online platforms such as Twitter, blogs, and review sites, where they are exposing the positives and negatives of retail based on personal experiences. From handling simple predictable demands, to varied and unique tastes, retailers now need to sift through and analyze terabytes of data to be able to understand customers’ requirements more precisely. There are many issues and questions retailers are grappling with:

· How do we increase margins at a product-level?

· What is the best way to target individual customers and provide them with tailor-made offerings?

· What marketing strategies should we employ and what RoI can we expect on the spends?

· What should be the best price for a product through its lifecycle?

· What promotions and offers do we employ in each store?

· How do we decide on the product-mix for each store?

· Is there a way to track the online shopping patterns of customers and provide them with personalized offers?

· How do we ensure effective stock deliveries during busy trading seasons?

With a slew of issues to tackle, retailers need to relook at the way business is typically run to ensure survival. This not only involves re-examining interactions with customers but also focus equally on back- end operations such as store planning, merchandizing, supply-chain infrastructure, and logistics to guarantee success.

Business analytics can help retailers break-down and assimilate information collected and provide better insights into customer tastes and preferences as well as back-end processes.

Angoss expertise & Happiest minds have come up with creative ideas and by combining with data mining and predictive analytics, they are trying to help retailers define a view of customers by analysing the wealth of data that resides within their organization. They can use this data to determine how to strategically segment the target market in order to predict future behaviour and meet all retail goals.

Retail Analytics can help improve numerous retail business functions such as:

Sunday, 29 March 2015

Cluster analysis for business

Clustering is the process of grouping observations of similar kinds into smaller groups within the larger population. It has widespread application in business analytics. One of the questions facing businesses is how to organize the huge amounts of available data into meaningful structures.Or break a large heterogeneous population into smaller homogeneous groups. Cluster analysis is an exploratory data analysis tool which aims at sorting different objects into groups in a way that the degree of association between two objects is maximal if they belong to the same group and minimal otherwise.

Business application of clustering

A grocer retailer used clustering to segment its 1.3MM loyalty card customers into 5 different groups based on their buying behavior. It then adopted customized marketing strategies for each of these segments in order to target them more effectively.

One of the groups was called ‘Fresh food lovers’. This comprised of customers who purchase a high proportion of organic food, fresh vegetables, salads etc. A marketing campaign that emphasized the freshness of the fruits and vegetables and year-round availability of organic produce in the stores appealed to this customer group.

Another cluster was called ‘Convenience junkies’. This comprised of people who shopped for cooked/semi-cooked, easy-to prepare meals. A marketing campaign focusing on the retailer’s in-house line of frozen meals as well as the speed of the check-out counters at the store worked well with this audience.

In this way the retailer was able to deliver the right message to the right customer and maximize the effectiveness of its marketing.

Features of clustering

Clustering is an undirected data mining technique. This means it can be used to identify hidden patterns and structures in the data without formulating a specific hypothesis. There is no target variable in clustering. In the above case, the grocery retailer was not actively trying to identify fresh food lovers at the start of the analysis. It was just attempting to understand the different buying behaviors of its customer base.

Clustering is performed to identify similarities with respect to specific behaviors or dimensions. In our example, the objective was to identify customer segments with similar buying behavior. Hence, clustering was performed using variables that represent the customer buying patterns.

Cluster analysis can be used to discover structures in data without providing an explanation or interpretation. In other words, cluster analysis simply discovers patterns in data without explaining why they exist. The resulting clusters are meaningless by themselves. They need to be profiled extensively to build their identity i.e. to understand what they represent and how they are different from the parent population.

In the retailer’s case, each cluster was profiled on its buying behavior. Customers in cluster 1 spent a quarter of their total spend on fresh, organic produce. This was significantly higher than other customers who spent less than 5% on this category. This segment of customers was called ‘Fresh food lovers’ as this is what distinguished them from the rest of the customers.

Types of clustering

There are different algorithms available for clustering, and each of them may give a different set of clusters. The choice of a particular method will depend on the objective of clustering, the type of output desired, the hardware and software facilities available and the size of the dataset. In general, clustering techniques may be divided into two categories based on the cluster structure which they produce.

The non-hierarchical methods divide a dataset of N objects into M clusters. K-means, a non-hierarchical technique, is the most commonly used one in business analytics.

The hierarchical methods produce a set of nested clusters in which each pair of objects or clusters is progressively nested in a larger cluster until only one cluster remains.

When to use clustering?

Clustering is primarily used to perform segmentation, be it customer, product or store. We have already talked about customer segmentation using cluster analysis in the example above. Similarly products can be clustered together into hierarchical groups based on their attributes like use, size, brand, flavor etc; stores with similar characteristics – similar sales, size, customer base etc, can be clustered together.

Clustering can also be used for anomaly detection, for example, identifying fraud transactions. Cluster detection methods can be used on a sample containing only good transactions to determine the shape and size of the “normal” cluster. When a transaction comes along that falls outside the cluster for any reason, it is suspect. This approach has been used in medicine to detect the presence of abnormal cells in tissue samples and in telecommunications to detect calling patterns indicative of fraud.

Clustering is often used to break large set of data into smaller groups that are more amenable to other techniques. For example, logistic regression results can be improved by performing it separately on smaller clusters that behave differently and may follow slightly different distributions.

In summary, clustering is a powerful technique to explore patterns structures within data and has wide applications is business analytics. There are various methods for clustering. An analyst should be familiar with multiple clustering algorithms and should be able to apply the most relevant technique as per the business needs.

Friday, 27 March 2015

Comparing Base SAS and SPSS

Comparing Base SAS and SPSS is an age old question between analytics professionals as both of these are one of the longest running statistical softwares in the world.

While Base SAS is on version 9 + and has greatly improved it’s visual appeal to counter SPSS’s click and get results interface, SPSS has moved beyond version 15.0 + and started adding modules like SAS has done.

Here I will be comparing specific SAS and SPSS components like SAS ETS with SPSS Trends, and SAS Base /Stat with SPSS Base.

Base SAS is almost 1.75 times as expensive in upfront cost for a single installation than SPSS.

SAS ETS is better than SPSS Trends for time series analysis for bad data, but SPSS Trends can easily make huge numbers of time series analysis than SAS ETS.

SAS is more tougher to learn than the point and click interface of SPSS.

SPSS Documentation is much better and give better clarity on algorithms used for statistical procedures.

Base SAS is much more powerful for crunching huge numbers of data (like sorting or splicing data),

for data that is smaller than say 100 mb, the difference is not much between SAS and SPSS.

SPSS is a perpetual license, while SAS has year on year license. This eventually makes it 2-3 times more expensive.

Modeling is easier done in SPSS but SAS can provide more control thanks to command line interface/advanced editor coding.The SAS Enterprise is not as good a visual interface as the SPSS.

For a startup analytics body, the best installation for both SAS and SPSS is network licenses preferably over a Linux network. You should ideally have a mix of both SAS and SPSS to optimize both costs and analytical flexibility.

Monday, 23 March 2015

Difference between Business Analytics and Business Intelligence

Experts maintain that business analytics is basically one term for a bigger concept and is associated with the following complex functions:

Enterprise information
Enterprise performance management
Data warehousing
Analytic applications
Business intelligence
Business risks
Compliance
Governance

BA vs BI

When you’re talking about data and what it can do for your company, a lot of terms get thrown around. Business analytics (BA) and business intelligence (BI) are two terms heavily used, but rarely given the same definition by any two sources. Some take the stance that they’re interchangeable, and others staunchly defend their position as to the meaning of each, and what would fall under those respective umbrellas.

On a basic level, BI is the ability to take information resources and convert them into knowledge that is helpful in decision making. The traditional method of doing this involves cataloging and examining data from past decisions and actions, and using this as a way of setting metrics benchmarks for the future. In method, BA is an offshoot of BI. BA focuses on using data to net new insights, whereas traditional BI used a consistent, repeating set of metrics to steer future business strategies based on this historical data. If BI is the way to catalog the past, then BA could be called the way to deal with the present and predict the future.

Hosted business intelligence solutions like RJMetrics offer a combination of BI and BA by providing a data warehousing and reporting solution alongside a flexible interface for ad-hoc analysis and data discovery that can point you toward smarter decisions.

The Evolution of Business Intelligence vs Business Analytics

In the past, BI has been used to talk about the people, processes and applications used to access and extrapolate meaning from data, for the sake of improving decisions and understanding the effectiveness of targeted decisions. But this is where BI as a baseline failed; something that runs entirely off of static, historic data severely limits a user’s ability to make predictive decisions and forecast for the future market. When an emergent situation arises on a Friday afternoon, the user doesn’t greatly benefit from looking at metrics collected prior to the introduction of that situation.

The rapid growth and demand for BA comes from this failing, and is in a way the evolved form of BI solutions. In a business world whose speed is ever-increasing, the user needs to be able to interact with information at the speed of business, not looking back over his or her shoulder at what happened in the past. BI setups alone do not support the occurrence of users asking and answering questions in the face of marketplace events as they happen. A company that is data-driven sees their data as a resource, and uses it to hedge out competition. The more current the data the user has, the better jump he or she has on the competitor, who may or may not have become a threat in a time so recent that traditional BI data reporting wouldn’t even take them into consideration.

Many companies are commonly implementing advanced analytics on top of their data warehouses, to bridge the gap between BI and current day needs. Perhaps this is the origin of the confusion between terms, as organizations pick and choose from different combinations of services and have no real understanding of what to call these mash ups.

Equally relevant is the fact that more and more people are being asked to interpret data in roles that are not strictly analytical. Product managers, marketers and researchers are moving towards data as a way to formulate strategies, and traditional BI platforms make it difficult to push data into real-time situations and what-if scenarios. With the importance of data-driven decisions increasingly becoming a realization for less tech-savvy branches of company teams, the need for more user-friendly and faster producing platforms also grows. Moreover, delivering the data that supports these decisions to a broader company team demands a more visual form of modeling tool, to improve understanding across all departments. Charts and graphs showing BA findings are quicker and more impacting than written out statistics and excel sheets full of data.

Data interpretation and the manipulation method of choice change as the market demands. While having a set of established methods is important to the effectiveness of a company’s strategy, it’s understanding the need for flexibility in the face of these changes that can be a company’s most valuable asset.

Sunday, 22 March 2015

Why do soft drink cans are 330 ml.....

After the last class, this question really struck me on why do soft drinks cans are always at 330 ml not 300 ml or 350 ml. The question made me do lot of research to understand what's the logic behind the size. Internet threw lot of information about why it is, however nothing was convincing or conclusive enough to define the logic, however below are some interesting theories behind the 330 ml

The one I find logical is that cola companies have 300 ml & 200 ml glass bottles and 500 ml to 2.5 liter pet bottles, hence the best size to come up for can is 330 ml which is 1/3rd of a liter and can be differentiated

In North America, the standard can size is 12 fluid ounces (355 ml). In India and most of Europe, standard cans are 330 ml, which is approximately 1/3 of a liter. In Australia, the standard can size is 375 ml and in South Africa standard cans are 340 ml

Other logic is, to reduce the package and transportation cost by packing a box of 24 of them fit into a standard size cardboard pallet and thence onto a standard wooden pallet

330 ml=33 cl 33 is the one of the degrees in this mason thing

Most of the cans over the globe are sourced from Indonesia and most of the Indonesian companies have the standard can size as 330 ml hence coke and pepsi follows the 330 ml size for their cans as the different size create more cost

That explains the dimensions. There is basically a lot of engineering that goes into the can's design. They found the optimum for the hand size and acceptable length, while maximizing the amount drink it can hold. They arrived at the current design. Also, manufacturability of the can made them choose a cylinder in the first place

The amount of aluminum used is at a point where the company who manufactures the cans is profiting the most.

There was a standard 350 & 340 ml bottles used in some developed countries, however part of marketing strategy they reduced the size to 330ml and kept the price intact

There was a shortage of Co2 in 1997 which translate to can size of 330 ml from the 355 ml

Now it up to you to select the most logical answer to our question, however sometimes going with the wind help us rather than questioning every marketing logics :)

Business Analytics in Financial Services.

Need :

· Data is a source of significant competitive advantage for any organization. Financial institutions need to support business activities and decision making in a fashion that is timely, relevant, verifiable, and personalized to meet a variety of stakeholder requirements.

· The financial services industry is constantly and rapidly changing, making it more and more challenging for financial institutions to keep up with the changes and with the competition.

· With a vast range of customers and customer needs, changing regulations, and growing fraud threats, financial services companies need information management solutions that will allow them to make smart decisions.

· Financial services front offices are increasingly pressured for real-time business analytics, customer insights, and centralized customer preference profiles for their delivery channels.

Application areas of Business Analytics in a typical financial organization:

· Operation Reporting:

Financial services companies sometimes overlook the many cost-saving opportunities within its own operation. By maximizing operational efficiency throughout their entire organization companies can reduce costs to help them increase and maintain their profit margins. Business Analytics allow companies to compare operational metrics across the enterprise to create accurate, real-time reports that identify areas in need of streamlining.

· Customer Credit Management:

With increasing customer diversity, financial institutions want to track their customers’ credit habits to discern the most profitable avenues and to protect themselves from loss through customer default on credit. Business analytics uncover new revenue opportunities through targeted up-sell campaigns and to prevent default through predictive analysis.

· Asset Management:

Financial services companies must keep track of extensive and varied assets in many different forms. Business analytics and monitoring capabilities allow companies to ascertain which assets are most profitable, determine how to maximize profitability of their various assets, and ensure that their many assets are properly managed. In addition, with the consolidation of assets resulting from mergers and acquisitions, business intelligence software allows financial institutions to identify areas where they can reduce redundancies and drive performance improvements.

· Risk Management:

Financial institutions need intelligent risk management strategies to meet regulatory requirements, as well as to ensure their own security. The regulations within the New Basel Capital Accord (Basel II) are driving financial institutions to determine portfolio risk segments, operational risk levels, and the associated required capital allocation. The reporting and disclosure requirements of the Sarbanes-Oxley Act have prompted financial institutions to conduct extensive risk assessments of both internal and external factors. Within a financial institution’s vast data warehouse and data stores there exist immense amounts of information that can help companies identify potential risk areas and detect fraud.

· Competition Analysis:

Financial institutions have multi-tiered management configurations, involving complex employee compensation structures, as well as diverse performance goals across the enterprise. Business Analytics allow companies to pull data from multiple sources for more accurate analysis and to support complex incentive compensation plans.

· Regulatory Compliance:

The changing regulatory environment within the financial services industry is requiring financial institutions to measure and report risk and manage their capital in new ways. International Financial Reporting Standards (IFRS) requires improved information on financial instruments in financial statements. Basel II is forcing financial services companies to adopt sophisticated methods for determining risk-adjusted capital, and the Sarbanes-Oxley Act of 2002 calls for accelerated financial reporting and more rigorous disclosures and certifications. Business Analytics consolidates, analyzes, and reports on financial and other enterprise information, and it allows companies to categorize risk and improve internal controls.