PYLON for Facebook Topic Data FAQ

All your questions about how DataSift and Facebook are working together – answered.


How are Facebook and DataSift working together?

DataSift and Facebook are working in partnership to enable brands and advertisers to gain insights from real-time Facebook topic data, while protecting the identity of individuals by aggregating and anonymizing results to provide insights.

How is this different from the Facebook Public Search API?

Unlike Facebook's Graph API v1.0 (which surfaced posts that people marked as 'public') and which was announced as being removed from April 2015 onwards, the DataSift service enables aggregate-level analysis of both non-public posts (status updates) as well as public posts, likes and comments. Results are delivered as aggregate, anonymized summaries.


What kind of data is available?

We are working with Facebook topic data. Topic data shows marketers what audiences are saying on Facebook about events, brands, subjects and activities, all in a way that keeps personal information private. Facebook topic data aggregates information from posts and provides counts on engagement data (likes, comments, shares). Each item of data is enriched for easier analysis with more than 60 attributes.


How are DataSift and Facebook ensuring the privacy of users' posts?

The service provides multiple controls:

  • User identity is removed before any analysis is performed.
  • Users' data never leaves Facebook's data centers: The underlying data used to create insights never leaves Facebook's servers.
  • Only the anonymized and aggregated results are delivered outside of the Facebook data center. *
  • Aggregates of at least 100 individuals must be represented before a data set is delivered. This ensures no individual can be identified based on data returned.
  • Detailed data used to create the analysis is deleted after 30 days.

* A small sample of posts that authors have intentionally made public are made available for validation purposes.

How is DataSift providing this data?

Brands can create and analyze data by using DataSift's API to:

  1. Define relevant data you want to analyze: This is done by creating a CSDL filter.
  2. Analyze the data for trends and insights: The DataSift PYLON API enables developers to run analysis queries on the data that has been collected.
What kind of infrastructure do I need to support Facebook data?

While there are billions of posts, likes and comments on Facebook every day, the detailed data never leaves Facebook. Developers define their analysis criteria and the processing is done inside Facebook's data center. Only aggregated and anonymized results are delivered so developers can focus their efforts on building insights, not infrastructure.

How is this different from the existing DataSift platform?

Up to now, DataSift has worked with public networks such as Twitter and Tumblr. As these networks primarily publish public-data, DataSift provides a filtered data feed to enable developers to filter for relevant content and consume that in their applications.

The new API delivers aggregate, anonymized data in summary format. This protects user privacy while still enabling insights from the data. The results are significantly easier to work with, as they are delivered as “counts” of data instead of raw interactions.


Facebook has a lot of data. Won't it take a lot of work to handle the volume?

No. Since the API returns summary counts, output results are returned in simple JSON files that are relatively small. This makes them much easier to ingest for your apps and analysis.

Can I see the actual text, images, videos etc. from posts?

No. All Facebook topic data is in aggregate form to protect user privacy. For example, instead of receiving raw interaction data such as “I like Coca-Cola!” your result could show that 500 women in Iowa have mentioned Coca-Cola in a positive way.


What are the big differences between Facebook topic data and data from other social networks?

Volume: There are just more people and more engagement on Facebook than anywhere else. This engagement forms the basis for the summary results returned.

Demographics: Demographic data gives you a better understanding of the audience that's engaging on a topic. Understanding demographics is more difficult when analyzing data from other social networks.

Topic-level analysis: Posts are classified in real-time into Facebook's Open Graph, making it easy to analyze the topics that are emerging from the millions of posts, comment and likes.

Depth of metadata for analysis: Facebook data is highly structured with over 60+ attributes available for analysis.

What kind of analysis are companies doing with Facebook data?

Facebook is the biggest source of public opinion data on Earth and topic data is the only way to gain privacy-safe insights. Since we deal in aggregate and anonymized data the use cases that fit the best are: brand health, marketing optimization, revenue generation, customer experience, operational efficiency and supporting Innovation. Use cases that require personally identifiable information are not applicable.


Are both real-time and historical data available?

Topic data is available for 30 day rolling windows only, which begin when you start recording. Data counts are updated up to the minute.

What countries does topic data cover?

We offer data from 139 countries and territories.

North America
United States

Denmark, Finland, Iceland, Ireland, Norway, Sweden, United Kingdom, Italy, Portugal, Spain, Austria, Belgium, France, Germany, Luxembourg, Netherlands, Switzerland, Cyprus, Bulgaria, Czech Republic, Hungary, Poland, Romania, Slovakia, Ukraine, Estonia, Latvia, Lithuania, Croatia, Greece, Macedonia, Malta, Serbia, Slovenia

Middle East & Africa
Kenya, Mauritius, Egypt, Morocco, Tunisia, South Africa, Ghana, Nigeria, Bahrain, Iraq, Israel, Jordan, Kuwait, Lebanon, Oman, Palestine, Qatar, Saudi Arabia, Turkey, United Arab Emirates

Latin America:
Argentina, Bolivia, Brazil, Chile, Colombia, Costa Rica, Cuba, Dominican Republic, Ecuador, El Salvador, French Guiana, Guadeloupe, Guatemala, Haiti, Honduras, Martinique, Nicaragua, Panama, Paraguay, Peru, Puerto Rico, Saint Barthelemy, Saint Martin (French), Uruguay and Venezuela.

Afghanistan, American Samoa, Australia, Bangladesh, Brunei, Bhutan, Cook Islands, Christmas Island, Fiji, Federated States of Micronesia, Guam, Hong Kong, Indonesia, India, Japan, Kyrgyzstan, Cambodia, Kiribati, South Korea, Kazakhstan, Laos, Sri Lanka, Marshall Islands, Myanmar, Mongolia, Macau, Mariana Islands, Maldives, Malaysia, New Caledonia, Norfolk Island, Nepal, Nauru, Niue, New Zealand, French Polynesia, Papua New Guinea, Philippines, Pakistan, Pitcairn Islands, Palau, Solomon Islands, Singapore, Thailand, Tajikistan, Tokelau, East Timor, Turkmenistan, Tonga, Tuvalu, Taiwan, United States Minor Outlying Islands, Uzbekistan, Vietnam, Vanuatu, Wallis and Futuna, Samoa

What languages does topic data cover?

Topic data detection is available in 11 languages: English, Dutch, French, German, Indonesian, Italian, Polish, Portuguese, Spanish, Turkish & Vietnamese.

Sentiment detection is available in 7 languages: English, French, German, Italian, Portuguese, Spanish & Turkish


Can I use this to target ads on Facebook based on what people are talking about?

Not directly. There is no integration between DataSift and Facebook’s advertising platform. However, topic data results/analysis can be used to create better marketing content, advertisements and better understand demographics and topics for informing your ad targeting strategy on Facebook or any other network.


Where do I go to find out more about accessing Facebook topic data?

Please contact our sales team.

We can help you build
privacy-enabled social technology

Contact us