Rare Book Monthly

Articles - February - 2025 Issue

Your Chatbot May Be Using Illegally Pirated Books to Answer Your Questions

A battle is brewing between an ancient source of information, the book and its authors, versus a new invention, the chatbot and its developers. The chatbot is a program that can answer whatever questions you throw at it. The grandaddy (all of three years old) and most famous chatbot is ChatGPT. It uses artificial intelligence (AI) to quickly sort through reams of information to answer your every question. But, where does it get that information? One of the major sources is books, copyrighted books. When the chatbot uses that information to answer your questions, the authors and publishers of those books get nothing. That makes them sad (perhaps a better word is “angry” or “POed”).

 

Some authors are angry enough to go to court. There are various cases floating around out there but a notable one pits comedian and writer Sarah Silverman against Meta, operator of Facebook, headed by Mark Zuckerberg. Meta's chatbot, Llama, is the culprit here.

 

It is alleged that Meta used the LibGen (Library Genesis) dataset to train its Llama chatbot. LibGen is a notorious, shadowy entity, possibly operating out of Russia. It's dataset contains over 196,000 pirated books. LibGen has been in the news before for “lending” its pirated books free of charge without compensating the authors. LibGen infringes on authors' copyrights and operates illegally but it doesn't matter. They can't be shut down or forced to pay because they can't be found. They regularly change their urls to avoid being shut down. LibGen is no small operation, receiving an estimated 9 million visits per month from the U.S. to “borrow” books. It is supported by donations (accepted in untraceable bitcoin only).

 

What Meta has been accused of doing is using this large pirated database of books to supply Llama with much of the information it needs to answer users' questions. The plaintiffs have alleged that approval to do so came from the top, Mr. Zuckerberg himself. This claim has focused on the use of pirated (illegally obtained) books, but that perhaps is not the biggest issue here. What if the books were legally obtained, purchased, borrowed from a physical library, or received as gifts. Would that be any better from a copyright standpoint? Probably not.

 

In Meta's opinion, this use of the authors' work fits under the “Fair Use” exception to copyrights. “Fair Use” is what lets you quote from a book, write a review or book report, use information you found therein to write something of your own, without violating its copyright. Generally speaking, if you change what you read, add your own twist, copy only a small portion, and such, you are not guilty of copyright infringement. What Meta is doing, leaving aside the issue of using LibGen's pirated texts, is both copying the entire book, but then only sharing a small, rewritten portion such as might be expected to pass the Fair Use text.

 

This will have to play out in court but the Judge seems less than impressed with the arguments made by the authors. The reality is that chatbots provide very useful information. You probably use one to answer your questions. It's sort of like speaking to a very learned individual. Practically speaking, paying 196,000 authors some small pittance each would be an absolute nightmare, and they might not agree to such an arrangement anyway. It's not that they don't deserve anything, but it probably isn't a lot, and making such demands might force the shutting down of this very new and useful technology altogether. Progress is hard to stop, even if some people feel hurt by it, and my guess is the courts will not do so here.


Posted On: 2025-02-04 01:22
User Name: keeline

I don't know that an AI LLM (large Language Model) like ChatGPT and others fits the usual definition of a "chatbot." That word is usually used for software that is in a chat session and may represent itself as a human. They have been around for a long time.

There is an allegation that the LLMs have been trained on collections of books, copyright and public domain, along with crawled websites, posts in groups, and more. But a court case will have to determine what was really used. We may only know if it is decided IN COURT. Often, an out-of-court settlement will leave many important details confidential.

Although LLMs create an illusion of intelligence, they are not. The best you can say is that they can write a word salad that is passable. Sometimes even this is after several tries and refinements of the prompt that it responds to.

Attempts to get ChatGPT to write fiction in a certain style usually fails. At the very least there are major plot holes that a human editor/reader would spot. I've seen several examples of this.

LLMs are better with words than they are basic arithmetic. This is part of the reason why the AI-generated art often has logical errors and too many fingers or not enough keys on a typewriter. It's a bit like looking at a toy train where the designer has a vague notion of a train but not an understanding of the real parts of a steam locomotive.

When people ask for book IDs from ChatGPT, I have seen it make up books that do not and never did exist. One example invented a volume in the Judy Bolton series which never existed and was not even a proposed story. This leads people down false trails and spending energy to look for books that do not exist. Then there is the energy required to complete the LLM query in the first place.


Rare Book Monthly

  • Forum Auctions
    Fine Books, Manuscripts and Works on Paper
    27th March 2025
    Forum, Mar. 27: Dürer (Albrecht) Hierin sind begriffen vier bücher von menschlicher Proportion, 4 parts in 1, first edition, Nuremberg, Hieronymus Andreae for Agnes Dürer, 1528. £30,000 to £40,000.
    Forum, Mar. 27: Book of Hours, Use of Rome, illuminated manuscript in Latin, on vellum, 26 fine hand-painted miniatures, 17th century dark brown morocco, [Lyon], [c. 1475 and later c. 1490-1500]. £25,000 to £35,000.
    Forum, Mar. 27: Brontë (Emily) The North Wind, watercolour, [1842]. £15,000 to £20,000.
    Forum, Mar. 27: Titanic.- Mudd (Thomas Cupper, one of the youngest victims of the sinking of the Titanic, 1895-1912) Autograph Letter signed on board RMS Titanic to his mother, April 11th 1912. £20,000 to £30,000.
    Forum Auctions
    Fine Books, Manuscripts and Works on Paper
    27th March 2025
    Forum, Mar. 27: [Austen (Jane)] Emma: A Novel, 3 vol., first edition, for John Murray, 1816. £10,000 to £15,000.
    Forum, Mar. 27: Picasso (Pablo).- Ovid. Les Metamorphoses, one of 95 copies, signed by the artist, Lausanne, Albert Skira, 1931. £10,000 to £15,000.
    Forum, Mar. 27: America.- Ogilby (John) America: Being the Latest, and Most Accurate Description of the New World..., all maps with vibrant hand-colouring in outline, probably by an early hand, 1671. £15,000 to £25,000.
    Forum, Mar. 27: Iceland.- Geological exploration.- Bright (Dr. Richard )and Edward Bird. Collection of twenty original drawings from travels in Iceland with Henry Holland and George Mackenzie, watercolours, [1810]. £20,000 to £30,000.
  • Forum Auctions
    The Library of Barry Humphries
    26th March 2025
    Forum, Mar. 26: Beckford (William) [Vathek] An Arabian Tale, first (but unauthorised) edition, Lady Caroline Lamb's copy with her signature and notes, 1786. £2,000 to £3,000.
    Forum, Mar. 26: Baudelaire (Charles) Les Fleurs du Mal, first edition containing the 6 suppressed poems, first issue, contemporary half black morocco, Paris, 1857. £4,000 to £6,000.
    Forum, Mar. 26: Beardsley (Aubrey).- Pope (Alexander) The Rape of the Lock, one of 25 copies on Japanese vellum, Leonard Smithers, 1896. £4,000 to £6,000.
    Forum, Mar. 26: Douglas (Lord Alfred) Sonnets, first edition, the dedication copy, with signed presentation inscription from the author to his wife Olive Custance, The Academy, 1909. £2,000 to £3,000.
    Forum Auctions
    The Library of Barry Humphries
    26th March 2025
    Forum, Mar. 26: Crowley (Aleister) The Works..., 3 vol. in 1 (as issued)"Essay Competition" issue on India paper, signed presentation inscription from the author, 1905-07. £1,500 to £2,000.
    Forum, Mar. 26: Rodin (Auguste).- Mirbeau (Octave) Le Jardin des Supplices, one of 30 copies on chine with an additional suite, bound in dark purple goatskin, Paris, 1902. £3,000 to £4,000.
    Forum, Mar. 26: Pellar (Hans) Eight original book illustrations for 'Der verliebte Flamingo' [together with] a published copy of the first edition of the book, 1923. £6,000 to £8,000.
    Forum, Mar. 26: Cretté (Georges, binder).- Louÿs (Pierre) Les Aventures du Roi Pausole, 2 vol., one of 99 copies, with 2 original drawings, superbly bound in blue goatskin, gilt, Paris, 1930. £3,000 to £4,000.
  • Sotheby's
    Sell Your Fine Books & Manuscripts
    Sotheby’s: The Shem Tov Bible, 1312 | A Masterpiece from the Golden Age of Spain. Sold: 6,960,000 USD
    Sotheby’s: Ten Commandments Tablet, 300-800 CE | One of humanity's earliest and most enduring moral codes. Sold: 5,040,000 USD
    Sotheby’s: William Blake | Songs of Innocence and of Experience. Sold: 4,320,000 USD
    Sotheby’s: The Declaration of Independence | The Holt printing, the only copy in private hands. Sold: 3,360,000 USD
    Sotheby's
    Sell Your Fine Books & Manuscripts
    Sotheby’s: Thomas Taylor | The original cover art for Harry Potter and the Philosopher's Stone. Sold: 1,920,000 USD
    Sotheby’s: Machiavelli | Il Principe, a previously unrecorded copy of the book where modern political thought began. Sold: 576,000 GBP
    Sotheby’s: Leonardo da Vinci | Trattato della pittura, ca. 1639, a very fine pre-publication manuscript. Sold: 381,000 GBP
    Sotheby’s: Henri Matisse | Jazz, Paris 1947, the complete portfolio. Sold: 312,000 EUR
  • Swann
    Printed & Manuscript African Americana
    March 20, 2025
    Swann, Mar. 20: Lot 7: Thomas Fisher, The Negro's Memorial or Abolitionist's Catechism, London, 1825. $6,000 to $9,000.
    Swann, Mar. 20: Lot 78: Victor H. Green, The Negro Travelers' Green Book, New York, 1958. $20,000 to $30,000.
    Swann, Mar. 20: Lot 99: Rosa Parks, Hand-written recollection of her first meeting with Martin Luther King Jr., autograph manuscript, Detroit, c. 1990s. $30,000 to $40,000.
    Swann, Mar. 20: Lot 154: Frederick Douglass, Autograph statement on voting rights, signed manuscript, 1866. $20,000 to $30,000.
    Swann, Mar. 20: Lot 164: W.E.B. Du Bois, What the Negro Has Done for the United States and Texas, Washington, circa 1936. $3,000 to $4,000.
    Swann
    Printed & Manuscript African Americana
    March 20, 2025
    Swann, Mar. 20: Lot 263: Susan Paul, Memoir of James Jackson, Boston, 1835. $6,000 to $9,000.
    Swann, Mar. 20: Lot 267: Langston Hughes, Gypsy Ballads, signed translation of García Lorca's poetry, Madrid, 1937. $1,500 to $2,500.
    Swann, Mar. 20: Lot 274: Malcolm X, Collection from Alex Haley's estate, 38 items, 1963-1971. $4,000 to $6,000.
    Swann, Mar. 20: Lot 367: Solomon Northup, Twelve Years a Slave, Auburn, NY, 1853. $2,500 to $3,500.
    Swann, Mar. 20: Lot 402: Anna Julia Cooper, A Voice from the South, Xenia, OH, 1892. $2,000 to $3,000.
  • Koller, Mar. 26: Wit, Frederick de. Atlas. Amsterdam, de Wit, [1680]. CHF 20,000 to 30,000
    Koller, Mar. 26: Merian, Maria Sibylla. Der Raupen wunderbare Verwandelung, und sonderbare Blumennahrung. Nürnberg, 1679; Frankfurt a. M. und Leipzig, 1683. CHF 20,000 to 30,000
    Koller, Mar. 26: GOETHE, JOHANN WOLFGANG VON. Faust. Ein Fragment. Von Goethe. Ächte Ausgabe. Leipzig, G. J. Göschen, 1790. CHF 7,000 to 10,000
    Koller, Mar. 26: Hieronymus. [Das hochwirdig leben der außerwoelten freünde gotes der heiligen altuaeter]. Augsburg, Johann Schönsperger d. Ä., 9. Juni 1497. CHF 40,000 to 60,000.
    Koller, Mar. 26: BIBLIA GERMANICA - Neunte deutsche Bibel. Nürnberg, A. Koberger, 17. Feb. 1483. CHF 40,000 to 60,000
    Koller, Mar. 26: HORAE B.M.V. - Stundenbuch. Lateinische Handschrift auf Pergament, Kalendarium französisch. Nordfrankreich (Rouen?). CHF 25,000 to 40,000

Article Search

Archived Articles

Ask Questions