NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
The Perils of ISBN (rygoldstein.com)
amiga386 1 days ago [-]
This reminds me of MusicBrainz, whose database stores "release groups", e.g. the album Nevermind by Nirvana is one, which can have hundreds of "releases", as different media (tape, CD, LP, promo, ...), different countries, later re-issues, etc. [0]

Sometimes these have different catalogue numbers or barcodes to distinguish them, sometimes they don't but they're still different. I've seen releases where the only difference is the label in the centre of the LP, or the back of the CD case has a two-column tracklisting vs a one-column tracklisting. Music publisher uses the same code and says it's identical and yet it's clearly not.

Then there's the "recordings" on an album, which even if they're never re-recorded can still end up chopped up, bleeped or remastered. They're not the same sound. MusicBrainz likes to track when they are exactly the same recording (e.g. the LP recording of a song appearing on a compilation album verbatim) and when they're not (e.g. radio edits of the LP recording). And if we're going beyond recordings by one artist of "their" song, i.e. cover versions, or just plain standards, those are "works", with composers, lyricists, and can be recorded thousands of times by different artists...

I greatly appreciate the pedantry and flexibility for noting down when creative works are the same versus where they differ, in relational database form.

[0] https://musicbrainz.org/release-group/1b022e01-4da6-387b-865...

ggm 18 hours ago [-]
I had a dual CD pressing of Bach (double violin concertos plus some other stuff, Zuckerman/Perlman, Colombia passed through a number of subsequent buyouts and re-releases) which simply would not index correctly from the cd-id track stuff.

I wound up making an account, uploading the info, managing the 29 different reasons a neophyte makes a mistake causing their data not to be accepted, and finally got my CD into the system. This included using a random chinese persons web from the 90s who presumably had come to Australia and bought the identical pressing which appears to be a hyper-local market specific variant of the ones which other (European, American) markets got.

I have massive sympathy for the brainz, because as this article on ISBN and my experience shows, people are cavalier about renewing their 'unique identity' info, when they think they don't have to.

SamWhited 1 days ago [-]
They actually have a (very new, still alpha, probably not a ton of data yet) database for books:

https://bookbrainz.org/about

I haven't looked into what their schema is like, but if it's anything like Musicbrainz it will be pretty comprehensive and easy to pull the data you want out of!

NoMoreNicksLeft 23 hours ago [-]
That's the post I made on r/plex a decade ago that pissed off a dumbass moderator and got me banned from there! I guess he hated books.

I've recently been doing data entry on Open Library... sometimes even worldcat doesn't have an OCLC for an edition, and Open Lib is my fallback. Maybe I should be doing it on Bookbrainz instead.

cedws 2 hours ago [-]
I worked on a project recently to organise my music and came across MusicBrainz. I wanted a reliable API to enrich my music with the proper metadata, but unfortunately the majority of my tracks weren’t in their database at all. Maybe the Anna’s Archive Spotify data will help there.

To me it makes the most sense to index music by its fingerprint. Releases, EPs, etc should just be pointers to that.

makr17 23 hours ago [-]
My favorite example of this sort of thing has been In My Tribe by 10000 Maniacs. The UPC/Catalog Number remained the same between the 1987 release and the removal of Peace Train (track 7) in 1989. I have this memory of sifting through the stock at a large used CD store in the mid-90s hoping to find the pre-removal version.
amiga386 21 hours ago [-]
https://musicbrainz.org/release-group/94d44c63-7dee-3921-aa6... all with the barcode 075596073820 and catalogue number 60738 / 60738-2 / 9 60738-2

Interesting to read that the reason for the removal was Cat Stevens' apparent endorsement of the fatwa against Salman Rushdie. It seems it was the band themselves that requested it? https://www.rollingstone.com/music/music-news/cat-stevens-br...

wink 12 hours ago [-]
This is kinda topical for me as I just scanned some barcodes off some CDs and my results were: 90-95% detection rate on MusicBrainz, and for the rest it ranged from "yeah, this is clearly the same thing with 10 tracks" to "oh my, there are 7 different regional versions with 10, 11, 12, 13, 13 tracks and I need to pay attention to grab the correct one so the last 3 songs are not wrong" and "this is some 5 EUR sample from an unknown label and really hard to find. Or their docs are not great, I had wished for something like "artist of track 1 = X and artist of track 2 = Y" that probably would have narrowed it down the most.
amiga386 11 hours ago [-]
When digitising my collection and using MusicBrainz as the main source of metadata, I had to add about 50 albums nobody had entered before. It has a huge amount of stuff - it already had 95% of my collection - but it's not perfect.

The best way to distinguish an album (after barcode or name/artist, and medium) is number of tracks, and if that's not enough, release year/country. I got my metadata by using their Picard tagger and the CD TOC (as it contains the number and lengths of tracks, it's much less ambiguous), but of course opening every case and putting every CD in the drive is a lot more effort than barcode scanning.

You can use the advanced search syntax if you need to look up multiple fields at once. https://musicbrainz.org/search -> Type "Release", method "Indexed search with advanced query syntax"

For example: barcode:075596073820 AND tracks:11 gives https://musicbrainz.org/search?query=barcode%3A075596073820+...

Docs: https://musicbrainz.org/doc/Indexed_Search_Syntax

bombcar 23 hours ago [-]
I know that for a book I've published via Kindle Press (the real ones, not digital) that there are at least 3 official revisions, and many many minor ones that as far as I know are only differentiated by the minor typos fixed, and MAYBE one of the numbers buried in the front matter. The ISBN has remained the same.
jll29 21 hours ago [-]
Converse problem: ISBN re-use:

"Officially, ISBNs should never be reused. However, problems can happen if:

- A publisher improperly reuses an ISBN

- A small or self-publisher mis-registers a book

- An ISBN agency error occurred

- A book was published before 2007 and conversion from ISBN-10 to ISBN-13 created confusion" [Source: ChatGPT]

In 2009, I had plans to use ISBNs to distinguish the books in my personal library. But after scanning some ISBN bar codes with a MacBook app, I discovered some codes were associated with different books (the app also pulled the cover art, so it was easy to spot). Never had the time to find out if the bar code scanning was defective (=did not use the check sum) or these were cases of assignment errors, which "shouldn't happen" but have already happened.

There is a certain type of ignorant developer who reused "unique IDs", I've even seen a database in production use where GUIDs were recycled (no joke).

joemi 20 hours ago [-]
Regarding your issues with ISBNs in your personal library, I suspect you must either have had an issue with your lookups/app, or you had several books from a (almost certainly tiny/amateur) publisher who improperly reused ISBNs. I've spent some time working at a bookstore with 80,000+ different ISBNs and I can count the number of issues with ISBN re-use we encountered on one hand.

We'd put pricing barcodes on every book in the store, and those were always based on the ISBNs and had the titles and authors printed on them, which was info that came from ISBN lookups either from Bowker's Books-In-Print data or Ingram's data. We'd print the barcodes in large batches and then have to match them to the books based on the title and author shown and verify with the ISBN, so all 80,000+ were checked, and the actual ISBN issues were _extremely_ rare and always from a _very_ small/amateur publisher.

WorldMaker 6 hours ago [-]
ISBN reuse is a quite large problem for some especially small publishers. The intended use of an ISBN is a product code for a retail point of service. If the book prices were expected to be basically the same, reusing the same ISBN for an entire shelf of books was sometimes fine if the retailers didn't mind the inventory management problem of knowing which specific book titles were left. For "pulp paperbacks" intended for a spindle at a grocery store they probably didn't care to manage the inventory by exact title, they managed it by the spindle-full.
ogurechny 16 hours ago [-]
Anyone who decided to make a catalogue for any decent enough library found that out on the first day.

(By “decent enough” I mean breadth. If you are strictly collecting some genre products from a small number of commercial publishers, you might be in the walled garden where everything just works.)

SBNs were introduced when, in addition to existing mass production, mass accounting and storage management for each item became possible (with computers). Outside of the centrally controlled environments they don't work well, or mean much. Sure, national authorities make enough rules about having proper ISBNs, but they do get ignored.

There are small university/gallery/collective publications that have bigger print runs than “official” books on some specific topic. There are books that are uniquely made or uniquely altered, and therefore can't share the identifier with another item. Most common example is getting an autograph — you probably want to know precisely where you've put the copy of Bible signed by the author, not just any other Bible that looks the same. Some people oppose ISBNs for political reasons, and either ignore them, or invent bogus numbers.

Then there's International aspect. Soviet Union, for example, did not use ISBNs until the very last of its years. There are still many books printed there — including complete works every scholar needs to reference — that never had any ISBNs.

Some works have been published for that last time a century ago. Some of them might had been immensely popular back in the days, but now they are forgotten. Others have been re-printed, but you've managed to get the first printed edition, a small book of then-unknown author. Those also won't have ISBNs.

So the idea itself that any book must be an interchangeable product from the batch in which each item has the same effect, and therefore can have the same identifier, is a bit narrow.

Obviously, professional librarians could instantly tell you that ISBN is merely one of the search markers, and is not the way the inventory is kept.

mmooss 20 hours ago [-]
Minor corrections can be new impressions rather than new editions, I think. On the copyright page, the impression line is the one that looks like this:

   30 29 28 27 26    2 3 4 5
As I understand it: That would be the 2nd impression, printed 2026. It's designed so the publisher can remove the innermost character(s) for each new impression, which I imagine was practical for printing presses - the type is already set, just remove a couple characters. Therefore the next impression this year would have,

   30 29 28 27 26      3 4 5
The 4th impression, next year:

   30 29 28 27           4 5
etc.
bombcar 9 hours ago [-]
Yep - the thing with KDP is that I had to insert that (and would sometimes remember to update it).

I also included a BZR revision number but that’s more difficult to do with git as it doesn’t really have the concept.

zvr 12 hours ago [-]
Just a comment pointing people to https://www.librarything.com/ which I find so much better than goodreads.

Regarding the taxonomy of WEMI (work, expression, manifestation, and item), all of them are useful since we are talking about books at different levels. From "I have read Don Quixote", which is about the work (translations are the same), to "My Don Quixote has coffee stains", which is about the item.

idoubtit 22 hours ago [-]
Wikidata is a FRBR-compatible public database of books. I don't know if it's good enough for the kind of books the author wants, but in recent years the quality of wikidata greatly increased for the books that deal with (about 1000 items).

BTW, they misunderstood their own example of "Hotel Iris" by Yoko Ogawa when they wrote "the same work is duplicated four times." In fact, those four entries in the list point to distinct works.

One of these is a French publication by the publisher Actes Sud. Translations are not the same work as the original. They are derived works.

But it's true this list is a mess. Another entriy has 3 editions, one in English and two in Spanish, so it's obviously an error that mixes two distinct works.

ZeroGravitas 15 hours ago [-]
In FBRB translations are generally considered the same work.

In Openlibrary specifically they should be combined as one work. The editions can store the language and the translator info.

The current grouping is probably because semi-automatic (and some manual) merging is easier for titles in the same language.

jhbadger 10 hours ago [-]
I'm not sure I like merging translations together. They really make a difference, not like merging irrelevant things like paperback vs hardcover. A lot of classic literature from non-English originals (and I assume vice versa) suffers from old, dry translations -- I remember reading Dostoevsky in high school and not liking it much but that's because it was using translations from the early 20th century. More modern translations feel much more alive.
ZeroGravitas 9 hours ago [-]
Their guidelines on what gets its own work, and what doesn't, are here:

https://openlibrary.org/help/faq/editing#works-special-cases

So an abridged or bowlderised or annotated or illustrated version are collected under the same work, even though people might have good reasons to want one over another (the language used and the specific translator being just two important attributes)

But summaries or adaptations or plays and screenplays are not.

There's always gray areas, but note the edition info isn't lost, it just lives in a subordinate position that is linked directly from the work.

crazygringo 21 hours ago [-]
> Translations are not the same work as the original. They are derived works.

Which adds yet another layer. Because you still want them to be considered as part of a larger single entity. If you're performing a search, you want to find the single main entity, and then have different translations listed the same way you have different editions listed.

ray_v 21 hours ago [-]
Is this not what semantic search enables?
crazygringo 3 hours ago [-]
No? I don't know how that relates or what you're trying to suggest.
qingcharles 17 hours ago [-]
Anna's Archive has done a lot of work with ISBNs, e.g.

https://annas-archive.li/blog/all-isbns-winners.html

AA makes use of a WorldCat scrape they performed:

https://annas-archive.li/blog/worldcat-scrape.html

I'm currently working on an ISSN database, which is periodicals, and is grossly underrepresented compared to books.

nomdep 21 hours ago [-]
A good time to remember that the Open Library came to be thanks to the initial work of Brewster Kahle (founder of the Internet Archive) and Aaron Swartz (RIP) http://www.aaronsw.com/weblog/openlibrary
saithir 23 hours ago [-]
Sometimes we definitely want 'items' though, so for example I am in a physical bookstore and see a book I might be interested in, so I buy it, to find out later back home that I already have the very same book - and edition - already. It's a scenario that anyone with some amount of books definitely encountered multiple times, I know I did it myself a few times. :)

Ability of an ISBN search of my collection would have helped me in this case - scanning a barcode is easy enough task to accomplish.

And even if I had a different edition, the resulting title from searching for a different edition would be enough to help me figure out that I should not buy a book I already own.

eudamoniac 22 hours ago [-]
Genuinely how is this possible? I have nearly a thousand ebooks and I'm certain whether I have or don't have one, because I obtained it deliberately. Are you buying books by the foot or something?
joemi 20 hours ago [-]
I'm not the person you were asking but it's happened to me too, with physical books, not e-books. I don't know if I can explain how it happens. I'd say that me not knowing the exact content of my bookshelf is similar to me not knowing the exact content of my fridge and pantry. There have been a few times when grocery shopping where I can't remember if I have a less-frequently-used ingredient (like cream of mushroom soup) but I'd like to make something using that ingredient soon, so I buy the ingredient, and then when I get home I find that I did have a can in the back of the pantry that I bought a year ago and the expiration date hasn't arrived yet. Oh well.

If you can really remember every single one of your nearly a thousand ebooks that you've bought, that's both impressive and baffling to me.

yial 18 hours ago [-]
Not the person you’re replying to but backing up what you’re saying - I read a lot of fiction for pleasure, I can say with some assurance whether I’ve read a certain fiction book, or, any book I’ve read for pure pleasure. (Though as time goes on and I get older… I once started reading a book and about 15 pages in realized I had read it before )

As a hobby, I do a lot of wood working. Recently I was acquiring some books on wood finishes. I accidentally bought one book three times - not all at once but over a several year period. I realized this while trying to organize.

Mainly because they can have very similar / generic titles, and be by different authors. Since I’m using them as a reference, even if I know the author, I can’t always remember if I own this book by them or maybe I own this book with very similar title by a different author.

joemi 2 hours ago [-]
Oh yeah I can definitely remember all the books I've read. The only ones I've ever bought twice were ones I hadn't yet read but had sitting on my shelf for a while with the intention of reading sometime.
f0cus10 20 hours ago [-]
Faced this problem multiple times before. Tbf, I did buy and subsequently transport enough books in luggages to restore a country's economy
saithir 14 hours ago [-]
I just don't find remembering an exact list of all of them a worthy information to keep in my brain. Maybe if I had a dozen, but between me and my wife we do read much more than that. I also don't remember all of the contents of my steam games list, because what's the point? I can always look it up quickly.

I did eventually solve my duplicate book problem by making ourselves a searchable list I can access remotely, so now I can just look it up when I'm at the store.

Being deliberate about obtaining them isn't even remotely related to any of that.

I'd imagine this is also more specific to them being physical items, since it's much easier and obvious to look up an ebook if you just look at a list of files wherever you keep them.

ajohnson1200 18 hours ago [-]
I built a personal / hobby site for books a couple years ago that was inspired by pinboard.io, and leaned heavily on ISBNDB (their API), during which I learned a lot about isnbn's and books, at least through the lens of what the ISBN DB API offers:

- searching by title, ie: "The last unicorn" will return books across many years, and many editions, and with lots of different titles, examples:

The Last Unicorn (thorndike Press Large Print Science Fiction Series) THE LAST UNICORN The Last Unicorn (40th Anniversary Edition) The Last Unicorn the Lost Journey The Last Unicorn: The Lost Version The Last Unicorn das Einhorn im Spiegel der Popkultur

and then books that have a similar title but are by completely different authors:

The Last Unicorn: A Search for One of Earth's Rarest Creatures

- there's no way to programatically link an ISBN or ISBN13 to all of the other variants of that book across years or editions ( "First Edition", "1st U. S. printing", "6th Printing", etc..) or bindings ("Hardcover", "Mass Market Paperback", "Library Binding", "Kindle Edition", "Audio Cassette", etc..) or languages ("en", "English", "zh", etc..)

- I wrote some code that would consume the 1000 items in the ISBNDB API search results, and attempt to reduce the list of search results based on the the language, the title, and author(s) using Jaccard similarity, and then sorted by year, and grouped by binding, which mostly worked to be able to see all editions for a book, but it's super messy.

Going to have to see if I can use OpenLibrary instead, looks like a great option.

jdranczewski 24 hours ago [-]
If anyone in the comments is in a similar predicament to the author and would like a book logging app, I will say that I disagree on their judgement of StoryGraph - I've found it a pretty decent interface, the search function is very good, and the (anti)features mentioned in the footnote are incredibly easy to not use, as the creators seem to understand that many of their users have a very strong preference to avoid AI bloat.
KPGv2 23 hours ago [-]
https://hardcover.app is another choice. It's the one I've been using since right after the second Trump inauguration when I decided to "de-oligarch" as much as possible.
millicentricism 1 days ago [-]
This also fails to take into account that ISBNs also contain the publisher ID in them. So identical copies of a book could have different ISBNs depending on which markets they are sold in.
boznz 1 days ago [-]
I'm not sure this is the case, I got my ISBN range through my government national library service, I could be wrong but when you let them know what the book is you are publishing they ask for the Publisher name, though I am guessing as the service is free and it only applies to New Zealand books and publications.
21 hours ago [-]
ilamont 1 days ago [-]
They don't contain the publisher name, but ISBNs are usually purchased in blocks of 10 or 100 or 1000 or whatever by a single entity, which is often a single publisher or corporation.

However, within the block publishers can assign ISBNs to different imprints.

NoMoreNicksLeft 23 hours ago [-]
For ISBNs from the big 5, the number really does indicate the publisher. I think the 5th digit (second after 978) can indicate at least some of the big publishers. Smaller ranges are available for purchase from the brokers. In Canada, the national library will even issue you one for free, if you self-publish.

The ISBN always indicates the country it's from, the United States getting the biggest block, other European nations and Japan getting their own, with Africa, the Middle East, and so forth all getting a block in common.

blue1 22 hours ago [-]
ISBN prefixes does not always indicate a country. They may be are indeed countries, but others are language areas (e.g. 0/1=English) or "regions" (groups of countries) or even other subjects.

See https://en.wikipedia.org/wiki/List_of_ISBN_registration_grou...

cestith 8 hours ago [-]
I buy a lot of books for an individual. I have a dedicated library room in my home, and that’s not the only place there are bookcases.

I shop by ISBN often because I want specifically a particular edition in a particular cover. So it’s not just title and author. It’s not even title, author, publisher, edition, and cover honestly. Sometimes there’s an Indian subcontinent English printing of a book that’s laid out differently and on different paper from the US/Canada market version.

One small drawback is sometimes I’ll order a book by ISBN, and the bookseller will locate it by ISBN, and it will be a completely different item on a different topic by a different author. Sometimes if a book is a small printing or is a very old title the publisher will recycle the ISBN.

rahimnathwani 1 days ago [-]
I'm not sure we always want 'works'. Sometimes different 'expressions' of the same work are different enough that they don't have the same value.

For example, compare the most recent edition of 'Straight and crooked thinking' with the one published in 1930.

vidarh 1 days ago [-]
I don't know that work, but I agree with you in general because of forewords etc. Or even appendices. And translations by different translators.

I "grew up with" a specific translation of Lord of the Rings into Norwegian, for example. There are two. They are very different. But the editions also differ in whether they include the appendices, whose illustrations are used, and more.

stackghost 23 hours ago [-]
>They are very different.

Are we talking material plot or characterisation changes?

vidarh 12 hours ago [-]
No, but many of the names are different, and stylistically they are very different. Depending on whether a translation tries to be fairly literal, or sound as if it is written for the language it is translated to the way the result feels will be very different.

An example is the name Bilbo Baggins. In the "canonical" Norwegian translation, he's become Bilbo Lommelun. "Lomme" means pocket, and "lun" means snug, warm, or comfortable. It's not literal, but it fits the nature of hobbits well while referencing the "bag" in Baggins", and the connotations comes immediately in Norwegian without having try to deconstruct the name.

In this case, I think the newer "canonical" translation is generally considered unambiguously the best, but people often have favourite translations. E.g. my favourite Scandinavian translation of Walt Whitman's Leaves of Grass isn't even Norwegian, but an old Danish translation which sounds much "softer" (it's hard to explain)

Cordiali 19 hours ago [-]
I know Norwegian also has two different written standards, found an example that demonstrates it:

>English: I will not tell anyone the secret.

>Bokmål: Jeg skal ikke fortelle hemmeligheten til noen.

>Nynorsk: Eg skal ikkje fortelja løyndomen til nokon.

Source: https://www.visitnorway.com/typically-norwegian/norwegian-la...

vidarh 12 hours ago [-]
Yeah, so really there are at least 3 translations of Lord of the Rings, to continue that example, and I was being a typical Bokmål user and ignored Nynorsk.

The title differences are also a good illustration of how different it can be:

Bokmål: Kampen om Ringen, Ringenes Herre (the first one is literally "the battle for the ring")

Nynorsk: Ringdrotten

RobotToaster 1 days ago [-]
The most obvious example of this is the innumerable[0] versions of the Christian bible.

[0] Before anyone says it, I'm sure some bible nerd has numbered them, it's hyperbole.

crazygringo 21 hours ago [-]
I think the point is, you want a single work when searching.

Then click on the item and drill down into editions sorted by year, or whatever.

But when you're doing search, it's terrible UX to be flooding it with tens of editions mixed in with other things with similar titles.

kxcrossing 18 hours ago [-]
I like the bait-and-switch here. “Let’s make my own app” which almost made me tab out, followed by an interesting dive into the perils of uniqueness in ISBN. I would still say overspecifying is better than under!
lodrion 20 hours ago [-]
ggm 18 hours ago [-]
A salutory lesson in field overloading and structured keys. There must be a aphorism for "things you cannot do with a key, if you don't know in advance thats how the key works" list.
galkk 16 hours ago [-]
Unfortunately, for isbns even if you know how the key works in theory and should be used by standard, reality will break you very soon. It’s quite loose. At least it was 10 years ago when I worked in the area of book catalogs matching, per different online stores.
galkk 16 hours ago [-]
I worked a little bit in the area. (it was 10 years ago in the area of book catalogs matching, per different stores/countries/bestseller lists)

ISBN is a an attribute/key, but not primary key, in database terms :)

ISBNs are messy and in real world you’ll see crazy amount of broken/edge cases that shouldn’t happen by the letter of the standard, but happen all the time in reality.

* For example, isbn can be reused by publisher for completely different book.

* 2nd edition, while very different, may have same isbn.

* Reissue of the same book could have different isbn.

* Textbook of same author for 6th and 7th grade could have same isbn.

* As soon as you’ll get in translations all bets are off.

* I already mentioned textbooks. How anbout about college books where each year there was slightly revised edition of same book.

If you ask yourself - wtf? You’re not alone.

—-

In my youth I heard horror stories about people who suddenly found multiple duplicate guids (uuidv1) in their databases because cheap Chinese knockoff network cards were using same MAC addresses. Think that with isbn that could Happen to you any time.

Ekaros 16 hours ago [-]
I did some data collection on my cookbooks. Figured out Lidl had used same ISBN for same book. In entirely different languages.
galkk 16 hours ago [-]
You feel my pain :)

Honestly, right now I probably wouldn’t even try to code complex algorithm of book matching but fed all of books metadata, including book covers etc to llm and it would do better than what we had.

Our algorithm had tons of special cases coded and in results ui there was a button “needs manual review”, that was launching review workflow (not a joke, business people has special support team in India, because we were matching not only books) for cases when confidence score was low.

joemi 19 hours ago [-]
A simple search for books is an interesting problem because some it makes sense to find based on title alone, while it doesn't make sense for other books.

Take To Kill A Mockingbird as an example... No matter what (English) edition of the book you read, you're likely reading the exact same content, even the exact same words, as any other English edition. There might be a different preface near the front or different blurbs on the back cover or a different number of words per page, but the actual story is word-for-word the same. A simple title lookup makes sense here in most cases.

Compare that to something like The Iliad, where the English versions are all translations and can vary greatly from translator to translator. While all telling ultimately the same story, a bad translation doesn't begin to compare to an elegantly beautiful translation, so you almost certainly don't want to treat all editions of The Iliad the same.

Translations aren't the only times that you wouldn't want to treat all editions of a title the same. Some books have undergone abridgments, revisions, or corrections, so the content won't be word-for-word the same between editions, but might or might not be close enough that it's worth considering them the same. Some books have heavily annotated editions, so while not changing the underlying content that all the editions are based on, the reading experience is quite different.

I could go on with differences, but I hope it's clear that there _are_ differences between books and movies when it comes to variations/releases. For books, I think the lookup issue is closer to how it is for board games. Board games, like books, have many editions and translations and often get updated/revised between editions. Sometimes the updates change the gameplay significantly, and other times they don't. Boardgamegeek.com is one of the best (if not _the_ best) catalogs of board games that there is, and it has regular discussions/arguments about whether a new edition of a game is different enough that it deserves its own page or if it should just be relegated to be an easy-to-ignore note in the Versions section of the previous version's page. I think a letterboxd-like lookup for books would have similar regularly-occurring debates, and, like with board games, ultimately have to be fairly hand-curated.

wise_blood 15 hours ago [-]
TMDB is the best metadata provider for my home media server, they just have everything.

Two great features are: season names and episode groups. The other day there was a thread about Babylon 5, where seasons have names and the watching order is different from the airing order. Perfect application of both

jiggawatts 1 days ago [-]
My state had a reading competition that listed books by ISBN, which was a real challenge for students to track down. Each library had different editions and even different cover art, so if you “found” the book you might not recognise it on the shelf, etc…

I worked on the library systems and one of my innovations was to use the ISBN mapping database of WorldCat to find books with identical content but different ISBNs to help kids find the books on the list.

Over ten years that one SQL join in the code made the kids read an extra million books they wouldn’t have otherwise.

My biggest “bang for buck” in my career!

DiggyJohnson 24 hours ago [-]
That is amazing. For odd reasons I had to get real familiar with ISBN as well. What did that sql command look like if you don’t mind me asking?
jiggawatts 14 hours ago [-]
Literally just a join against a lookup table of alternate ISBNs. So the ISBNs in the reading list were first “expanded” with all possible titles, then they were matched to the books actually in the library.

The ISBN alternatives table was just groups with an integer ID shared by each group. My import process synthesised this from worldcat data, which was more messy.

1 days ago [-]
gerdesj 23 hours ago [-]
When you delve into real domain specific knowledge, surprises often surface and it turns out that what you might think is a simple thing is actually rather complicated.

I'm mildly surprised at exactly how successful ISBNs are. I worked in a book wholesaler's warehouse 35 odd years ago and the ISBN was used as the product code by the "system". I'd get a series of picking lists for pallets on good old green "staved" fan fold. I'd whizz around the warehouse with my trolley and pick from paper packets of books. The product lines had the rack and bay, last four from the SBN, quantity, title and full SBN. The packets of books had the rack/bay/last four from SBN printed on a label in large and small other details. I got very good at optimising my course around the warehouse and could pick at a right old rate, whilst listening to my mini cassette player. Its pretty boring work so you might as well game it!

Sometimes an individual book might fall off my trolley and be dumped in the big cardboard "skip" for rejects. For some reason casualties around me generally involved subjects like maths, material sciences, geology, surveying, hydrology. Oh and fractals!

I graduated in civil engineering.

Anyway. Surely all of us here know that really getting to grips with defining what it is that you are cataloguing/indexing/numbering/whatever and why can be quite tricky.

Both Dewey and SBNs catalogue "books" but for very different reasons. Both systems are extremely successful. You might think that in our world of LLMs n that, that books, Dewey and SBNs will go the way of the dodo.

Perhaps, but I doubt it.

Right, bugger all this old school nonsense. I've got a C64 (it rocks a SD card interface and a HDMI out (via SCART - must sort that out)) blinking away on my telly in the sittingroom and some mutant camels need a bloody good kicking.

toomuchtodo 1 days ago [-]
If the author sees this comment, https://news.ycombinator.com/item?id=43168838 might be relevant as it relates to catalogue completeness. OpenLibrary is very good, but Anna's Archive is potentially more complete.
CodesInChaos 1 days ago [-]
I read that it's much worse than that, and there are ISBNs that were reused for completely different books.
rmunn 17 hours ago [-]
I've been cataloguing my books using the ISBN to look them up, and I think I ran into that situation a few times, maybe about 0.2% of all the books I catalogued. (That is, the ISBN search on openlibrary.org returned multiple clearly-different books for the ISBN I searched for). I didn't pay much attention to it so I can't tell you which ISBNs were duplicates, but I've definitely seen it happen.

But there is at least one case where it was on purpose. There's a set of reading primers from the UK called the Biff, Chip and Kipper books. We acquired a whole set of them at a garage sale, and when I went to enter them into my catalogue, I discovered that the publisher had assigned just one ISBN to the whole series. Which quite annoyed me when I discovered it. (I ended up just not cataloguing those books, because I didn't want to type the titles, author, copyright date, etc. in by hand for 50+ tiny books).

joemi 19 hours ago [-]
In my experience this is very very rare. Rare enough that it's practically negligible.
CodesInChaos 1 hours ago [-]
Even if it's rare, it means a database like goodreads can't assume that an IBAN is linked to only a single book.
NoMoreNicksLeft 23 hours ago [-]
I've stumbled across 3 or 4 magazines that printed the wrong ISSN in more than one issue. One from the 80s did so in every single issue of it's 20some issue run. It must be true that some books have done so as well, but I don't even check that those are correct.
mmooss 20 hours ago [-]
> there’s a distinction between the work (the book The Last Unicorn), the expression (a given edition of the book), a manifestation (a given physical format for an expression, such as paperback or hardcover), and an item (an individual object in a collection)

The author misunderstands 'work', as far as I know: A work is "intellectual or artistic content of a distinct creation. It refers to a very abstract idea of a creation e.g. Shakespeare's Romeo and Juliet and not a specific expression."[0]

In contrast, an "expression" is an "intellectual or artistic realization of a work. The realization may take the form of text, sound, image, object, movement, etc., or any combination of such forms."[0]

The Last Unicorn story is the work, "the book The Last Unicorn" is an expression as would be the film version or the computer game, etc.

[0] https://www.ifla.org/references/best-practice-for-national-b... (as of a few years ago)

bell-cot 1 days ago [-]
The first few para's of https://en.wikipedia.org/wiki/ISBN are a better summary of the issue.

tl;dr; - The ISBN is intended to be a physical Part Number, within the book business. Where "hardcover, or paperback, or trade paperback, or large print, or revised edition, or ..." very much matters.

KPGv2 23 hours ago [-]
> why isn't there a letterboxd for books

There is. https://hardcover.app

I used Letterboxd a lot before kids. I used Goodreads until the Trump inauguration when I de-Amazon'd myself as much as possible (Amazon owns Goodreads). I switched to Hardcover, which is a much better interface. There are ways to improve, but overall I prefer it over Goodreads.

ncfausti 22 hours ago [-]
What would you like to see improved?
lccerina 8 hours ago [-]
not OP, but sometimes I add the ebook version to my library even if I never use them. To find it again and fix it, sometimes I need to scroll through the years of read books to spot the tiny icon that distinguishes the version of the book. It would be nice to have a simple filter by book type (physical/ebook/audio) in the library page.
KPGv2 3 hours ago [-]
actually the #1 criticism i had has been fixed since the last time i made reading progress: one-click progress updates from Home. I launched the app on my phone and bam, it's right there!

I'm not sure there is anything I'd improve. I recall when I started creating a list of Akutagawa Prize winners in translation, it was a bit painful because I had to input a lot of ISBNs for rarer books. Also I struggled for a couple books with picking the right version so the page count was correct.

But I haven't done either of those in a while since I haven't read more rare stuff in a year. Possibly they've gotten better, too.

NoMoreNicksLeft 23 hours ago [-]
>Uh-oh. Why do we have so many distinct versions of The Last Unicorn? Well, each distinct format of a work has its own ISBN (so a hardcover, paperback, and eBook all have different ISBNs),

This isn't even the half of it. On some digital books, I'll find a dozen ISBNs in the front matter. Of course there's the hardback, the clothbound (not always the same as the hardback), the alk. paper variant, paperback, trade paperback, epub, pdf, "Adobe digital", and "master digital e-book" (no idea what that even is myself). And that's all just issued together. If they reprint, it won't get a new ISBN, but if the rights convey to another publisher, that one will get a whole 'nother set again. Some popular titles likely have low hundreds of ISBNs, and keep in mind that these have only been a thing since the late 1960s (9 digit ISBNs, technically just SBNs back then). Then with the now dead paperback trade, you could go through a dozen different covers for the most popular books (King, etc) but they'd all use the same ISBN.

Then, and this one bites me the most... if archive.org scans in a hardback with its ISBN, what do I use for the scanned pdf? I've decided that for lack of a better alternative I have to use it, but if the publisher made their own pdf (even just scanning the hardback), then it is supposed to issue a new ISBN to it.

Cataloging my own library, I've had to use a hodgepodge of unique ids. ASINs, ISBNs, Worldcat's OCLC numbers, Open Library's, and a few others besides. And it still comes up short. The number of oddball publishers and pamphlets and so forth that have never been cataloged anywhere is enormous.

user205738 18 hours ago [-]
Your question has already been answered, but you considered the option of specifying several ISBNs, a description of the book, a link to the website with this edition, the publisher, and a note with details of the book's format (hardcover, soft cover, etc.)

Personally, I have never had all these indicators match in any book. It also allows you to find a very specific publication using a semantic search, specifying a combination of tags/publisher/formats.

WorldMaker 6 hours ago [-]
> if archive.org scans in a hardback with its ISBN, what do I use for the scanned pdf?

Archive.org would recommend using the OpenLibrary IDs instead of ISBNs. (OpenLibrary is an Archive.org project.)

> The number of oddball publishers and pamphlets and so forth that have never been cataloged anywhere is enormous.

I think it's more the case that number of catalogs is too many. At least with LibraryThing it always seems like somebody has cataloged everything, but we have such a hodgepodge of ID systems and catalog numbers in part because so rarely have all the catalogs been connected or have tried to be connected. It's only a relatively recent library phenomenon that so many small library catalogs can talk to each other on the same protocol, much less coexist in the same broader search tool.

> Cataloging my own library, I've had to use a hodgepodge of unique ids. ASINs, ISBNs, Worldcat's OCLC numbers, Open Library's, and a few others besides.

In part because most of my personal catalog is in LibraryThing, I've been impressed with LibraryThing's Works ID as a generally trustworthy unique ID for a book. LibraryThing benefits from an interesting mix of volunteer and professional librarian work (especially the work of a lot of tiny and interesting niche libraries across the world) in deduping and merging editions together into the same Work ID. StoryGraph and OpenLibrary are also doing interesting things in this space, but LibraryThing has the momentum of time (it's as old as GoodReads and not an Amazon side project) and the benefit of extra (nerdy) labor.

I also like the LibraryThing IDs because they are generally short, opaque (which is a weird feature sometimes), and don't look anything like an ISBN because they aren't intended for that. StoryGraph's IDs are GUIDs, which I will forever find ugly in their normal - delimited hexadecimal rendering. Open Library's look like ISBNs for reasons that I don't understand, but I do appreciate that you can use the last letter of the ID to distinguish between an edition ID (ends in M for reasons I don't know why) and a work ID (ends in W), and the OL prefix does help them stand out next to other catalogs' IDs.

I built a voting website for my current favorite book club and I thought I could do everything with just the LibraryThing Works ID but then I keep adding other IDs to the "database" (YAML frontmatter) as time goes on. LibraryThing doesn't have a Covers API because most of their edition covers come from Amazon and Amazon is restrictive on that. If I add the OpenLibrary Edition ID, I can use the OpenLibrary Covers API as Archive.org has very nice terms on that today. (Not the OpenLibrary Works ID, because covers are associated at the Edition level, which does make some sense, but the website UI shows a default cover from a random edition so I'm not sure why the API couldn't return that cover from the Works ID, but it is nice to pick and choose Edition covers anyway and I can't complain too much having a working cover image API from someone.) I started adding StoryGraph IDs because members of the club love StoryGraph right now and also because while StoryGraph doesn't have an Official API yet (it is on the Roadmap), I discovered StoryGraph's CWs section was amenable to easy scraping. I figured since an API for it is on the Roadmap a bit of light scraping (with attribution!) was fair. (My club wanted CW information to help decide on book voting. LibraryThing intentionally doesn't track CWs as too hot button and subjective, but StoryGraph has a rather nice "voting" experience for CWs and before I started to scrape StoryGraph's CWs we were already starting to copy and paste them by hand into the Markdown documents. The scraping provides better attribution and a unified display.)

Finnucane 21 hours ago [-]
>if archive.org scans in a hardback with its ISBN, what do I use for the scanned pdf?

The scanned pdf just doesn't have an ISBN. ISBNs are assigned by publishers to products for inventory management. That's it. If archive.org scans a book, it's not a product that needs inventory control.

NoMoreNicksLeft 18 hours ago [-]
I need unique identifiers. And I disagree. For me, the scan of a book keeps the same ISBN as the printed book that was scanned (when it has an ISBN at all). No other sensible alternative really exists. I also believe Open Library catalogs them the same (since they are archive.org too, and doing much of the scanning).
Finnucane 6 hours ago [-]
You may need a unique identifier. You may use the isbn if you like. No one will stop you from doing that. But no isbn-issuing entity has applied an isbn to that file.
davtyan1202 1 days ago [-]
[flagged]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 22:45:16 GMT+0000 (Coordinated Universal Time) with Vercel.