Acting on Analytics: The Official Mineful Blog

Tools for Text Analysis

Tools for Text Analysis
Text analysis is a process in which semantic and other information can be collated so that it can be analyzed in a quantitative manner to arrive at some decisions. This is a process that has been used for years in traditional market research where open ended questions are collated, a code list is made and then the various response forms are coded before data entry can be done for analysis.

Depending on the depth to which you analyze text, the data can throw up some basic and some very insightful information that can be derived through patterns and trends. The techniques that are used in text analysis include linguistic, statistical and machine learning. It also involves retrieval of data from various sources on the Internet, lexical analysis, pattern recognition, annotations and other such data mining techniques, albeit in the language arena. There is also an element of text categorization, text clustering, concept extraction, and sentiment analysis and document summarization too.

There are various kinds of online, offline, free and paid text analysis tools that are fairly easy to use. These have been listed below.

Free Text Analysis Tools
Some free and open source text mining tools allow you to understand the kind of comments that are being made on your blog, microblog, social networking page or elsewhere on the Internet.
- GATE – This is an open source toolkit that delivers the results in a graphical environment
- INTEXT – A DOS version of TextQuest, this text mining tool has been in the public domain for more than 7 years now.
- Open Calais – Another open-source text analyzer that includes semantic functionality and can search and analyze text within a blog, content management system websites, applications or more.
- LingPipe – Part of the suite of Java libraries this tool is free and can be used for a variety of linguistic analysis.
- RapidMiner Text Mining – A great source that allows you to check out the comments on your networking page without having to view them manually.
- S-EM (Spy-EM) – A tool that helps in text classification and dividing them into positive, negative and neutral responses based on learning.
- The Semantic Indexing Project – An open source tool again that includes semantic analysis and search applications too.
- Text Analyzer – This online tool is extremely easy to use. All that you need to do is to enter the text that you want analyzed to give you a detailed analysis on the same web page in seconds. Data retrieval is not part of this text analysis tool, though.
- Tagul – Gorgeous tag clouds responsible for the above tag cloud of this post.

Commercial Text Analysis Tools
Some of the text analysis tools that you may want to consider if you are more serious about the depth of analysis that you perform on your site, social media pages and more have been detailed below.
- ActivePoint – A tool that offers natural language processing (NLP) and categories text based on contextual search.
- Alceste – An easy to use software that allows automatic analysis of all kinds of text.
- ClaraBridge – Text mining software for businesses.
- Crossminder – A good text analysis tool that used natural language processing and various other text analytics techniques.
- Eaagle text mining software – A tool that is used by many due to the speed with which it structures large volumes of data to give direction.
- ClearForest – This solution gives meaning to unstructured information by using data mining technologies.
- SPSS LexiQuest – The advanced text analysis tool from SPSS.
- Expert System – A tool that uses the proprietary COGITO platform and creates clusters of text that can be interpreted.
- Analyze Words – An intelligent software that can analyze the personality of a website, a brand or even a person based on the words that are used. It categorizes words and content into upbeat, worried, angry, depressed, arrogant, personable, sensory and more. Basically the three dimensions that are analyzed are emotional style, social style and thinking style.
- Lexalytics – Transforms unstructured text to structured information, almost magically.
- Lextek Profiling Engine – A tool that classifies, routes and filters electronic text based on user defined profile.
- Recommind MindServer – A tool that uses PLSA (Probablistic Latent Semantic Analysis) for accurate text retrieval and further classification.
- Attensity – A software that goes a step further and classifies text based on “who”, “what”, “where”, “when” and “why” facts.
- SAS Text Miner – An advanced and reliable tool that is used by many market researchers and web analyzers for text analysis.
- DiscoverText – A tool used by many market research and web analytic companies to create text analysis solutions.
- Xanalys Indexer, an information extraction and data mining library aimed at extracting entities, and particularly the relationships between them, from plain text.
- Wordstat – An easy and yet powerful tool that helps analysis textual information in responses, open-ended questions and interviews.
- OpinionEQ – A solution based on advanced semantic and linguistic research focusing on the problem of collecting, interpreting, and structuring both Web and real time communications.



What is TURF Analysis?
TURF analysis, also known as Total Unduplicated Reach and Frequency is an analysis technique that is used to calculate the unduplicated reach of a product line or a range of products. It is a very common technique that is used in market research to assess the turf analysis reach combination of products that will ensure maximum reach, frequency of purchase and therefore revenues, when launched.

TURF analysis was first used by media planners to estimate the unduplicated number that will be reached in a specific campaign when the campaign is run across various media vehicles like television, print and radio. This very technique was adapted for market research and is now used to estimate the flavor combinations, color options and variant baskets for products.

In most cases, TURF analysis is used to optimize product portfolios. It is known that launching multiple SKU’s (Stock keeping Units) is not the right strategy to use to expand reach. A large portfolio puts significant pressure on resources and makes the task of managing logistics, retailer stocking and production planning far more difficult.

In addition to that, the return on investment that you get with additional launches is never an additive. In most cases, new launches cannibalize some of the existing variants as loyal customers move to try out new variants.

Benefits of TURF Analysis
A TURF analysis helps in identifying the proportion of customers who are likely to try out the new variant. It helps in identifying maximum reach for the entire product line in a realistic manner since the technique exposes the audience to all the variants that are likely to be in the market. This is therefore a technique that can also minimize the number of SKU’s while ensuring that the revenue is not compromised.

By performing ‘what-if’ analysis based on various assumptions and scenarios, the incremental value that each new variant can add can also be calculated. The technique itself is a sequential one that assumes a certain product line and then adds on to the existing variants to ascertain the best option or combination to be launched.

Question Types
TURF analysis can be performed with various kinds of preference questions for specific variants. Some prefer to use a purchase intention scale (ranging from ‘would definitely buy’ to ‘would definitely not buy’). Others can also choose to ask product preference on a desirability scale and yet others opt for a multi-select choice question. All respondents are exposed to all the products.

An Example
For example if product A gets has a preference of 80 percent; product B 60 percent and product C 40 percent, it cannot be assumed that product A and B will form the best combination. This is mainly because it is possible that there is a large overlap between those who prefer product A and B; in which case a combination of product A and C would get the company better returns.

TURF Analysis with Mineful
The above example has been simplified and uses three product lines. But when you need to do this analysis for a variety of variants, it does become a complicated and messy task if it has to be done manually. Mineful now offers TURF analysis that you can perform with a few clicks. The entire exercise can be completed in a matter of minutes. Learn how to run TURF analysis with Mineful!



With a large amount of completion in every industry, there is hardly anyone who does not keep a close watch on the basic financial parameters of a business. Some of the key parameters that every business measures include volume, revenue, profitability, investments, return on investment, volume growth, revenue growth, conversion rates or trial rations, retention and more. This aspect of the task takes care of the measuring and monitoring of the business parameters. Tracking these parameters over time gives the managers an idea of the progress of the business.

However, despite this level of analysis there is an element that is missing. While these reports help in understand the state of the business as it is, they do not provide any insight into what steps should be taken if the sales are low. Neither can these tell you the specific parameters to be changed so as to ensure better acceptance of the product or service in the marketplace. This aspect of managing these parameters can only be achieved if you know the factors that affect these parameters. Basically, you need to know the key factors that drive purchase to be able to ensure higher sales for your product.

What is Driver Analysis?
Key driver analysis is a statistical tool that uses multiple regression to identify the specific parameters of a product or a service that drive a particular action. Some of the questions that driver analysis answers are:

  • What aspects of my restaurant business results in higher customer retention?
  • What are the specific product features that lead to purchase?
  • What are the specific areas that I can choose to ignore without losing out on my current clientele?
  • Is recommendation for my brand occurring due to the product performance or service satisfaction?

The interesting aspect of key driver analysis is that you could use it to understand any of the variables that you need to understand. This means that you could understand what drives loyalty, purchase, repeat purchase, satisfaction or recommendation.

It is important to understand that direct questioning does not provide accurate results all the time. While some respondents are likely to give politically right responses, there are other issues too. Direct responses result in hygiene factors being confused as drivers. This is mainly because these are factors that are never really stating as ‘not important’.

How does Driver Analysis work?
Driver analysis should be used with care since inadequate understanding of the manner in which the statistical tool should be used can result in wrong deductions. It is extremely important to ensure that all the relevant parameters for the product or service are covered in the overall data collection. In addition to that the wording of the parameters is also important.

Driver analysis includes a dependent variable and various independent variables. The dependent variable is the aspect that you need to understand better. So when you want to check out the factors that drive purchase, the response to purchase become the dependent variable. Other variables that you have captured data on like performance on specific attributes can be considered to be independent variables. It is assumed here that these independent variables shall in some way drive the dependent variable. The whole idea is to understand the manner in which these factors drive purchase and the extent to which they do.

The key output of a driver analysis that uses regression models is a score out of a total of 100 indicating relative importance of each of the independent variables that have been tested. A table of importance that immediately allows you to understand the features and parameters that are truly important helps you in deciding where to focus your monies.

Benefits of Driver Analysis
The main benefit that key driver analysis lends for your business is an understanding of the factors that you need to tweak to make things turn. For example, an understanding of the features of a product that drive purchase can help you devise a communication strategy around the feature ensuring better sales. If you are concerned about customer retention for your business, driver analysis to understand the factors that drive retention can help you better service levels.

It is practically impossible to allocate funds towards improving all areas of product features or service in a business. Careful prioritization is essential. Driver analysis helps you prioritize the aspects that you need to concentrate on for short term and long term success.



Survey Data Integration

Mineful released today the first solution to easily integrate customer or products data with survey data. This allows researchers to see how customer information and survey answers are related. Mineful’s survey data integration gives researchers a powerful yet easy way to integrate survey data with respondent characteristics, such as shopping patterns, customer segment, and geographic location. The result is a clearer understanding of different segments of the market.

For example, suppose that a grocery store chain conducts an online survey of customers who have signed up for shoppers’ discount cards. The survey might ask customers how important they consider things such as ease of checkout, expanded store hours, and availability of special services, such as pharmacies or florists shops, in the store.

A survey like this will be valuable in itself, but it will be even more valuable if the consultant can integrate survey data with data gathered from discount cards. These cards typically provide information about where, when, and how often customers shop, what they buy, and how much they typically spend on each shopping trip.

If a consultant attempted to ask for this information as part of a survey, it would create two problems. First, respondents might not be able to provide accurate information about such things as how often they shop or how much they typically spend. Second, asking for such information would make the survey considerably longer, and the longer the survey, the less likely it is that people will complete it.

By integrating survey results with data from shopping cards, a researcher can determine answers to questions such as:

  • What services are most important to the chain’s best customers?
  • Which stores have the highers customer satisfaction?
  • How does customer satisfaction affect purchase frequency?

In technical terms, a user can upload two kinds of files: 1) a respondent list with customer emails and other columns that describe the respondent or 2) a general data file that could describe a product, store, or something else about the organization. The respondents’ list is the example we explained above.

In the second case, a user can upload a table for which a code or an answer must be entered to retrieve information. This could be a products table, store level table, demographic table, etc. For example, a construction company that replaces windows, siding, and gutters might have information on its products serviced by warranty number. The first question of the survey asks the respondent his warranty number. The information on the table (product category, date, service representative, price, location, etc.) is then linked to this person’s responses.

Survey data integration tools, such as these, allow marketing departments to give decision makers the information they need to get the most out of their marketing efforts.



| Back to top