technaut (technaut) wrote,

File Indexing Service.

There is an online media database called freedb that performs a very useful service. Almost anyone who has ever converted an Audio CD to MP3 format has used freedb, even if they weren't aware of it at the time.

The service stores metadata about the songs that can be found on audio CDs. Since the CDs themselves only store the raw song data, information about the names of the tracks, the musicians, even the name of the CD itself have to be sought elsewhere. Rather than forcing everyone who has ever 'ripped' a CD to enter this data manually, the freedb service lets you enter the information into a central repository.

That way, 99.9% of the time, when you put a CD into a ripping program, it will find an already-existing entry in the database for you to work with. Only in cases where the CD is so new, or so obscure, that no one has seen it before will you have to enter the data by hand. Once you've done so, you can then upload it to the central repository so that the next person won't have to do the same.

Now, CDs are not the only digital objects that could benefit from an online database of meta-information. Anyone who has ever used a P2P client has come across the problem of files that are misnamed, misidentified, corrupt or just plain not what you thought they were from the name. A similar service that all P2P services could hook into to store and retrieve metadata on arbitrary data files could be a boon to P2P users everywhere.

And I'm not just talking about a boon for pirates and illegal downloaders. If the P2P system is ever going to be seen as a legitimate method of distributing software and music, then it will be necessary to have a way of distinguishing between content that is free for everyone and content that one needs to have purchased.

As a slight aside, this doesn't necessarily mean downloading copyrighted works should be forbidden. I have a large number of books that I have purchased. My understanding of my rights to that content means that I have the right to an electronic copy for backup purposes. I could, of course, use an OCR system to scan in the book, but its usually just easier to download it off the net. Besides, its slow and time-consuming to search a physical book for a particular fact you remember it having, while searching a text file is usually much faster and easier. Ultimately the only person who can actually know if they have the right to download a particular file is the person that is doing it, and I would like to make it easier for them to know where they stand. For the rest of this article I'll simply assume that its a good idea to let people be better informed of the details of files they encounter on the Internet, including their copyright status, as a thorough debate of this issue could fill several books (and has).

Considering the acrimony of the debate on issues surrounding P2P distribution and the confrontational natures of the various parties involved, anyone that provided this service would face a number of technological hurdles. To deal with high traffic loads, denial-of-service attacks, database poisoning attacks, and a number of other dirty tricks, the system would need to be carefully designed.

It should be distributed across multiple servers in different geographical locations and use a collaborative filtering system to help maintain a high degree of data accuracy. At the same time, it needs to be able to accept information updates from large numbers of users who may well have legitimate reasons to do so anonymously due to the regulations of their home countries. These problems are all solvable, although they would require some hard work on the developers part.

The results would be a very useful service, and the owner of that service would find themselves in possession of a valuable database. The predecessor to freedb sold all rights to their database and made a fortune, albeit at the cost of bringing the service off line. The real success of a P2P file metadata database would be when it becomes so useful that copyright owners are willing to pay to have authorities data inserted into the database.
  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your IP address will be recorded