Human Data glossary



Aggregated: Data that is summarized.

Anonymized: Data with no electronic trail that would lead someone to its origins.

API: This is an acronym for Application Programming Interface. It’s a defined path for a program to accomplish a task, usually by retrieving or modifying data. Learn more: API FAQs

Augmentation: Enhancing data by adding information derived from internal and external sources to it. Learn more: DataSift augmentations

Big Data: Any collection of data so large and complex that it becomes difficult to process using standard data management tools or applications.

CSDL: Curated Stream Definition Language, created by DataSift, filters in real-time what you need from the torrent of data flooding out of social media sites. Learn more: DataSift CSDL

Data enrichments: DataSift’s way of giving social data meaning by providing context and insight into what is being conversed across a social platform. Learn more: Data enrichments

Data filtering: Setting a group of criteria that segments a subscriber list or data extension. Learn more: Data filtering

Data mining: Collecting and analyzing raw data to discover patterns or relationships, and transforming it into useful information.

Engagement data: Collection of data measuring how visitors interact with social media. For example, the number posts, likes, or shares a visitor takes place in on Facebook.

Firehose: A data API that provides a continuous stream of all available data in real time. It is constant, delivering updated data as it happens.

Geotag: Data embedded in a digital media file to indicate geographical information about the subject, usually latitude and longitude.

Human Data: The fastest-growing type of data, covering the entire spectrum of human-generated information, shared across social networks, blogs, news sites and inside the business.

JSON: JavaScript Object Notation is a data-interchange language. It uses conventions that are familiar to programmers of the C-family of languages, such as C, C++, C#, Java, JavaScript, Perl, and Python.

Machine learning: The science of getting computers to act without being explicitly programmed. Learn more: Machine learning

Metadata: Data that describes other data. It summarizes basic information about data, which can make finding and working with particular instances of data easier.

Natural language processing: The ability of a computer program to understand human speech as it is spoken.

Normalized data: The process of reducing data to its canonical form. For example, organizing the fields and tables of a relational database to minimize redundancy and dependency.

Predictive analytics: Extracting information from data to determine patterns and predict future outcomes and trends.

Public API: The point of access that others - such as programs and programmers - have to your software.

Query builder: Tool that allows you to integrate data by designing queries visually and immediately realizing business value.

Sentiment analysis: Tracking public opinion with data.

Social analytics: Monitoring, analyzing, measuring and interpreting digital interactions and relationships of people, topics, ideas and content to make business decisions.

Social data: Refers to data that individuals voluntarily create and share.

Social media: Internet software and applications that allow people to exchange information such as biographical data, professional details, personal photos and thoughts.

Streaming API: An Application Programming Interface that can capture information in real time. Learn more: Streaming API

Topic data: Anonymized and aggregated content data that is being shared on Facebook. This includes activities, events, and brand names. Learn more: PYLON for Facebook Topic Data

VEDO: An extension of DataSift’s core platform that adds structure to social data; it lets you define rules to classify data so that it fits your business model. Learn more: VEDO


bitly: The world’s most popular link-sharing platform. It’s used to share more than 80 million new links every day, with more than 200 million clicks per day on a bitly link.

Blogs: The Blog data source combines material from a wide variety of sites, ranging from well-known hosts such as Blogger with very large numbers of active users to small, single-user sites that run as blogs or incorporate a blog.

DailyMotion: A major video-sharing site, attracting more than 110 million unique monthly visitors and 1.2 billion views worldwide.

Disqus: A free commenting service that enables great online communities.

Edgar Online: Delivers fast, firehose-like access to more than a half a million full-text financial documents per year, including 10Ks, 10Qs, 8Ks and more, to track financial activity and disclosures.

Facebook: Social platform that allows its members to like, share, and comment on pictures, videos, websites, articles, and more. Facebook currently has over 1.44 billion monthly active users.

Google+: A social networking site from Google that introduces Circles, Sparks and Hangouts along with other unique features.

IMDb: Contains a wealthy database of details about movies, TV and news that includes a huge collection of user reviews. A primary source for movie related opinions.

Instagram: An online photo-sharing, video-sharing and social networking service that enables its users to take pictures and videos, apply filters to them, and share them on a variety of different social networks.

Intense Debate: A robust commenting platform which powers discussions on WordPress, Blogger, Tumblr and other content management platforms.

LexisNexis: The single, most powerful, global news and business information service, which provides users with the breadth and depth of information to put your social insight in context.

MessageBoards: A data source that provides content from a variety of message boards around the world.

NewsCred: Licenses, curates and syndicates full text news articles, images and videos from more than 2,500 of the world's highest-quality publishers, including leading financial and entertainment publications in a fully license-compliant way.

Reddit: A social news sharing site, boasting over 120,000 subscribers, where users submit posts in the form of either a link or a text "self" post. Other users then vote the submission "up" or "down," which determines the rank of each post and prioritizes its position accordingly.

Sina Weibo: The dominant player in China’s social landscape. This platform is a rich data source for those with a presence in China or those that need to understand how to market to the second-largest economy in the world.

Tencent Weibo: One of China’s biggest microblogging services allowing users to post messages consisting of up to 140 Chinese characters.

Topix: The leading news community on the Web, connecting 12.4 million people to the information and discussions that matter to them in every U.S. town and city.

Tumblr: One of the world’s fastest growing social network, Tumblr is a platform for people to share content they love. By providing a simple-to-use blog to share content, Tumblr has grown a global audience of over 300 million unique visitors, generating more than 100 million posts every day.

Videos: The Videos data source collects content from many of the lesser-known video hosting sites. You can use this data source in conjunction with the YouTube and DailyMotion sources for maximum coverage.

YouTube: The world's most popular video hosting site. This data source offers new content, including the title, duration, the username of the author, and a link to the video itself, plus comments on existing videos.

Wikipedia: This data feed monitors editorial changes at Wikipedia such as the creation of new pages and updates to existing ones.

WordPress: Widely considered the world’s most popular content management system. It’s a key data source when measuring and understanding the web presence of your brand or industry.


We can help you build
privacy-enabled social technology

Contact us