Community Data Dump
A list of Stack Exchange Community Creative Commons Data Dump releases.
Torrents and Internet Archive items that were created by the company are
indicated with the π’ emoji, while those created by community members are
indicated with the π₯ emoji.
All known releases of the Stack Exchange Network's Creative Commons Community
Data Dump, containing all non-deleted posts from non-beta communities,
with links to download using BitTorrent or from the Internet Archive. Some
releases are no longer available.
The primary version of each release is a collection of XML files in compressed
7ZIP containers, representing
the database schema described in this post.
These are available in
an unofficial Torrent RSS Feed.
Some releases are also available as unofficial conversions to other formats.
This site is not affiliated with Stack Exchange Inc.
From its inception in September 2008, user content on Stack Overflow (and later
the broader Stack Exchange Network) has been released under a Creative Commons
License.
Starting in June 2009,
the company began releasing periodic "data dumps" containing most types of
user-contributed content on the site. These were available for download through
BitTorrent, so that the community would always be able to continue seeding the
content so it remains available, even if the company themselves ceased to do so.
From January 2014 to April 2024, the latest release was also available for
download from the Internet Archive.
This was intended to help set Stack Overflow apart from some of the Q&A sites
that had existed in the past, which kept user contributions under restrictive
licenses and tight control, and tended to become more user-hostile over time.
This spirit was described by Co-Founder and CEO Joel Spolsky,
in a podcast on February 20, 2010:
From day one, we used the CC-wiki license. It's basically a license that says
that we don't own the content that's on there, which is why we make those
database dumps that are available.
We wanted to make sure that if no matter what happens, literally no matter who
we sell to, or raise money from, or turn the site over to, and even if they
take Stack Overflow, and make it an evil site where you have to pay to look at
things and there's pop-up ads and pop-under ads, and you know, dancing
chariots of fire that cross the screen and punch the monkey, and [...] it just
becomes a big gigantic spam site.
Doesn't matter because you can just take the latest CC-wiki download that we
provided and go start your own site saying "you know what, this is gonna be
the clean version". And I think a lot of people will follow you. We very, very
deliberately built Stack Overflow in a way that there wouldn't be any chance
of locking it down.
Unfortunately,
in July 2024
the company announced that going forward, the data dumps would not be released
as torrents or uploaded to the Internet Archive. Instead, users would need to
log in to download them directly from the Stack Exchange website, so that they
could monitor who was downloading them, with the explicit threat that they would
block users from downloading if they were using the data for a purpose that the
company didn't approve of. Additionally, there would no longer be a single
release containing data from the entire network. Instead, users would need to
manually go to each of the 368 sites in the network to download its data.
In order to preserve open access to Stack Exchange community data, some
community members will be going through the effort to download the dumps from
all sites across the network, and re-distributing the complete collection as
torrents resembling those that were previously provided by the company. Some of
these will be in original XML format, while others will be converted to other
formats that are more convenient to use directly.
This site exists to provide an index of all releases, including
community-aggregated ones going forward as well as the company-provided ones
from the past.
Each item in these releases is copyright its respective authors and editors, who
are identified in the release's data. Each item is distributed under a Creative
Commons Attribution-ShareAlike license (CC BY-SA, previously also known as
CC-Wiki), but the exact version of the license depends on the date that on which
the item was contributed. According to
stackoverflow.com/help/licensing:
Please see
stackoverflow.blog/2009/06/25/attribution-required
for guidance on how you are expected to attribute content to its authors.
This page uses some CSS from
the Stacks design system, which is copyright
Stack Exchange, Inc and released under the MIT License.