In a world where tech giants regularly vacuum up the collective knowledge of humanity for their AI ambitions, we’ve learned that Meta has allegedly downloaded 81.7 terabytes of pirated books – roughly 7.5 million titles – to train its artificial intelligence systems. Meanwhile, a decade ago, internet activist Aaron Swartz faced federal charges carrying 35 years in prison for downloading academic articles which he already had legal access to. Tech justice has never been so consistently inconsistent.
The story of Meta’s literary heist emerged through court documents in a lawsuit filed by authors including Ta-Nehisi Coates and Sarah Silverman.1 Internal communications reveal Meta employees expressing such profound moral concerns as “torrenting from a [Meta-owned] corporate laptop doesn’t feel right”.2 One can almost hear the deafening roar of ethics officers not being consulted.
Corporate Downloading 101: A Step-by-Step Guide to Avoiding Prison
According to court filings, Mark Zuckerberg himself allegedly approved using LibGen, a notorious piracy site containing millions of books and academic papers, to train Meta’s AI models.3 When you’re worth approximately $171 billion, apparently federal prosecutors suddenly discover the nuanced legal concept of “fair use” – that magical shield that transforms what would be “felony theft” into “innovative data acquisition strategy” faster than you can say “political campaign contribution.”
The lawsuit claims Meta not only downloaded these works but also potentially re-uploaded about 30% of them through BitTorrent, actively contributing to the piracy ecosystem in the process.4 This is the equivalent of borrowing a library book, making photocopies, and then setting up a free photocopying stand outside the library entrance while wearing a t-shirt that says “DEFINITELY NOT STEALING.”
Meta’s defense falls back on the classic Silicon Valley incantation: “fair use,” arguing that training AI on copyrighted works “transforms” rather than reproduces the material.5 This is like saying it’s legal to steal a car if you’re just going to melt it down and use the metal to build a robot that can describe what cars are like.
The Aaron Swartz Memorial “One Standard for Thee, Another for Me” Award
Contrast Meta’s situation with Aaron Swartz, who in 2011 downloaded approximately 4.8 million academic journal articles from JSTOR through MIT’s network6. Despite being a Harvard research fellow who had legitimate access to these articles, Swartz faced federal charges of wire fraud and computer fraud carrying potential penalties of up to 35 years in prison and $1 million in fines.7
US Federal prosecutors, led by Assistant U.S. Attorney Stephen Heymann, pursued Swartz with the tenacity usually reserved for international terrorists or people who put pineapple on pizza. When Swartz’s lawyer informed Heymann that his client was a suicide risk, the prosecutor reportedly responded, “Fine, we’ll lock him up”. Nothing says “proportional justice” like threatening decades in prison for downloading articles that were primarily created with public funding.
The charges against Swartz weren’t even about copyright infringement. They primarily related to his methods of accessing the MIT network. JSTOR itself declined to pursue civil litigation, stating they wouldn’t press charges. But US federal prosecutors, apparently desperate for a way to demonstrate their tough-on-nerds stance, charged ahead anyway.
The Definitive Guide to Legal Digital Downloading (Based on Current Precedent)
Based on these two cases, we’ve compiled this helpful flowchart for determining if your downloading activities will result in:
A) A strongly worded letter from lawyers
B) Federal prosecution and potential decades in prison
- First question: Are you a trillion-dollar corporation? If yes, proceed to A. If no, continue.
- Second question: Did you download the content to advance human knowledge and promote free access to information? If yes, proceed to B. If you downloaded it to make money, potential penalty reduction.
- Third question: Will your downloading potentially make billions of dollars for shareholders? If yes, download away! If no, prepare for the full force of federal law enforcement.
A Meta spokesperson declined to comment for this article but telepathically projected intense feelings of “we’ll probably get away with this” directly into our consciousness.
The Downloads and Downloads-Not
What makes these cases even more absurd is that Swartz never distributed the articles he downloaded. According to JSTOR itself, “the downloaded content was not used, transferred nor distributed”. His alleged crime was essentially taking too many books out from the library at once.8
Meta, on the other hand, allegedly re-uploaded approximately 30% of the pirated books it downloaded through BitTorrent, actively participating in the distribution of pirated content. This is like getting caught shoplifting and then setting up a booth in the parking lot to sell the stolen merchandise – except instead of jail time, you get to be one of the most powerful companies on Earth.
How to Calculate Your Digital Crime Sentence
We’ve developed a proprietary algorithm to calculate potential sentences for digital crimes:
- Net Worth < $1 million: Sentence = (Bytes Downloaded ÷ 1000) × 0.5 years in prison
- Net Worth $1 million to $1 billion: Sentence = Stern letter and possible fine of up to 0.001% of annual revenue (0.0001% fine if company based somewhere offshore.)
- Net Worth > $1 billion: Sentence = Free publicity and increased stock price
The tech industry has long operated on the principle that it’s easier to ask for forgiveness than permission – unless you’re an individual, in which case you should ask for permission, get it in writing, have it notarized, and still expect federal charges.
The Zuckerberg Doctrine of Digital Appropriation
Legal experts who’ve never actually practiced law but have strong opinions on Twitter (Now X) predict Meta will likely settle the author lawsuit for an amount that sounds impressive in the news headlines but represents approximately 18 minutes of company revenue. The settlement as usual, will include no admission of wrongdoing and a press release about how Meta values creators and is committed to working with them in the exciting field of AI development.
“The fundamental difference between Swartz and Meta,” explains copyright attorney Morgan Blackwell, “is that Swartz wanted to democratize knowledge, while Meta wants to monetize it. Our legal system is specifically designed to distinguish between these cases by asking: ‘Which one makes rich people richer?'”
When reached for comment, a Department of Justice spokesperson said, “We take intellectual property theft very seriously unless it’s done at sufficient scale to be considered innovation.”
Redefining Fair Use for the AI Era
Meta’s defense hinges on “fair use,” the legal doctrine that allows limited use of copyrighted material without permission.9 This is the same defense that would have likely been available to Swartz, had prosecutors been interested in such nuances.
“Fair use is like quantum mechanics,” explains digital rights activist Eliza Thornberry. “It exists in a state of superposition where it both applies and doesn’t apply until you observe the net worth of the entity claiming it.”
The tech industry has successfully expanded the definition of fair use to include:
- Copying the entire text of millions of books if you’re training an AI
- Downloading scientific papers if you’re a multi-billion dollar corporation
- Pretty much anything else if your legal team is large enough
However, fair use explicitly does not include:
- Downloading academic papers if you’re an individual activist
- Making content more accessible to the public without a profit motive
- Anything that challenges existing power structures in technology
The Capitalism Loophole in Copyright Law
What we’re witnessing is the emergence of what legal scholars call the “capitalism loophole” in copyright law. This unwritten but universally recognized principle holds that copyright infringement is determined not by the act itself but by whether the act serves the interests of capital accumulation.
As tech ethicist Dr. Julian Mercer puts it: “If you’re downloading content to share knowledge freely, that’s theft. If you’re downloading it to create proprietary AI systems that will generate billions in shareholder value, that’s innovation.”
This principle explains why Meta can download 81.7 terabytes of pirated books and face only civil litigation, while Aaron Swartz faced federal charges carrying 35 years for downloading articles to which he already had legitimate access. The difference is not the act but the purpose – and under the current US justice system, profit is the most legitimate purpose of all.
Conclusion: The Moral of Our Immoral Story
The moral of this story, if there can be one in our increasingly post-moral tech landscape, is simple: Scale changes everything. What’s a crime at human scale becomes a business strategy at corporate scale. What’s theft when done by an individual becomes innovation when done by a trillion-dollar company.
Aaron Swartz tragically died by suicide in January 2013, facing an impossible choice between a plea deal that would label him a felon or risking decades in prison. His death led to proposed legislation called “Aaron’s Law” to amend the Computer Fraud and Abuse Act, though it never passed. Meanwhile, Meta continues to build its AI systems, partially trained on the very type of content that led to Swartz’s prosecution.
As we navigate this brave new world of artificial intelligence built on questionably acquired knowledge, perhaps we should ask: If an AI is trained on millions of pirated books, does it develop a moral compass? Based on the example set by its creators, we already know the answer.
What’s your take? Has the legal system created two separate tracks for individual activists versus corporations? Is Meta’s downloading of pirated books substantially different from what Aaron Swartz did? Let us know in the comments!
If you found this article enlightening, consider donating to TechOnion. Your financial support helps us continue to point out the blatantly obvious double standards in tech that everyone else pretends not to notice. For just the price of one-millionth of what Meta will probably settle its copyright lawsuit for, you can support journalism that asks the important questions, like "Why is it only crime when poor people do it?" Donate today-because someone has to say what we're all thinking, and the people who own the platforms sure aren't going to let you say it there.
References
- https://futurism.com/zuckerberg-books-train-meta-ai-libgen ↩︎
- https://www.socialmediatoday.com/news/meta-used-pirated-books-to-train-ai-systems/737605/ ↩︎
- https://www.socialmediatoday.com/news/meta-used-pirated-books-to-train-ai-systems/737605/ ↩︎
- https://www.netizen.net/news/post/6193/metas-controversial-ai-training-piracy-allegations-explained ↩︎
- https://www.reuters.com/legal/litigation/tech-companies-face-tough-ai-copyright-questions-2025-2024-12-27/ ↩︎
- https://crln.acrl.org/index.php/crlnews/article/view/8637/9062 ↩︎
- https://en.wikipedia.org/wiki/United_States_v._Swartz ↩︎
- https://sur.conectas.org/en/aaron-swartz-battles-freedom-knowledge/ ↩︎
- https://techhq.com/2025/01/meta-used-pirated-content-and-seeded-illegal-copies-by-bittorrent/ ↩︎