Data Categorization

Add Business Context To Social Data

One of the biggest challenges for any company analyzing social data is extracting its meaning automatically so you can analyze and act on it. This is one of the biggest ‘big data’ challenges - transforming unstructured content into structured data.

DataSift VEDO is the next generation of the DataSift core processing engine and features Programmable Intelligence: the ability to automatically categorize social data based on its meaning, helping you to transform it into useful data in your application. Programmable Intelligence provides three powerful approaches to help extract meaning and context within each Post, Tweet or Comment:

Programmable User Rules

  • Business analysts and application developers  can easily customize rules to classify social data based on any combination of criteria found within the content, text-patterns or data.
  • Use advanced text-pattern matching  and near-word matching to look for mentions of products within social data, then classify conversations based on products being mentioned.

Machine Learning

  • Apply machine-learning  models to train a classifier, then run it in VEDO. For example, create an intent-classifier to identify purchase signals from social data.
  • Train and build  models based on Bayesian classification, logistic regression and support vector machines.

Library of example classifiers

  • Get a jump start  using our built-in data-science or find inspiration in our our collection of best-practice models.
  • Use the classifiers out of the box  or as a starting point to identify potential SPAM, leverage machine learning to categorize social posts into complaints, compliments or requests for help, or score affinity towards a product or topic.

Classifier Library

We provide a set of pre-built classifiers that you can customize to suit your needs.

Twitter Application Used

Identifying and distinguishing content source is key to understanding social activity. Use this classifier to identify sources of Tweets including automated services, client applications and social networks. With this categorization in place you can easily distinguish between sources and improve your insights.

Data Sources  view all

Activate  Inquire

Automotives

By classifying data to match your business model you can more readily understand and integrate data with your existing tools and systems. This classifier demonstrates how you can identify brands and products mentioned in conversations, and fit social data to match your business taxonomy.

Data Sources  view all

Activate  Inquire

Tumblr Activity

Tumblr is a key data source for analyzing social trends. Activities a user can perform with a piece of content differ from Twitter, particularly focusing on the concept of the 'reblog' action. Use this classifier to categorize user activities on Tumblr and understand engagement.

Data Sources  view all

Activate  Inquire

Device Used

Identifying user devices is a key component to understanding an audience and how they interact with a brand. This classifier identifies properties of devices used to create content. Based on this insight you can more effectively plan marketing campaigns and more precisely target potential leads.

Data Sources  view all

Activate  Inquire

Dow Jones Companies

Often our customers wish to track conversations relating to an industry, or to themselves and their competitors. This classifier demonstrates how you can identify companies being mentioned in conversation - in this case within the Dow Jones Industrial Average.

Data Sources  view all

Activate  Inquire

Professions & Roles

For many use cases it is important to identify the profession, role and seniority of users. For instance such distinction is critical when identifying sales leads. This classifier categorizes Twitter users by profession, role and seniority.

Data Sources  view all

Activate  Inquire

NFL Fans

By identifying fans you can identify related interests to target or plan direct marketing campaigns. This classifier demonstrates how you can categorize users by their passions, in this case the NFL team they support.

Data Sources  view all

Activate  Inquire

Job Advertisements

Job advertisements are common on Twitter; you might consider these as noise distorting your insights. This classifier scores Twitter posts for how likely they are to be a job advertisement, allowing you to improve your analysis.

Data Sources  view all

Activate  Inquire

Competitions & Marketing

Competition marketing is common on Twitter, this noise might impact your insights. This classifier scores Twitter statuses for how likely they are to be related to a competition, allowing you to improve your analysis.

Data Sources  view all

Activate  Inquire

Shared Content Type

Frequently our users look to focus in on what content is being shared by users. For instance, who is sharing videos relating to a musician, or who's sharing slides relating to a conference? This definition identifies the type of content being shared by users.

Data Sources  view all

Activate  Inquire

Technology Analysts

Frequently our customers look to categorize posts based on individuals or organizations who publish the content. This example classifier categorizes content posted on Twitter from top technology market research companies including Gartner, IDC and Forrester.

Data Sources  view all

Activate  Inquire

News Providers & Topics

The powerful link enrichment features of our platform allow you to understand how influential content is shared. This classifier identifies content being shared from influential news sites such as WSJ, Techcrunch and BBC, and groups news by high level topic.

Data Sources  view all

Activate  Inquire