my suggestion - avoid re-download the same file

youseemi · June 29, 2006

Hi everyone...

i've been using bitcomet for some time...

i really love it....anyway there is a feature a would like to have in future version

.................................................................................................................

its unavoidable that we will download the same files in future,

if the program can prompt us that we actually downloaded those files b4,

this will help us not to re-download again and will save us a lot of times.

just take a look at this feature in FLASHGET - download manager

it would be great if the future version can include this feature!! :D

Thanks !!!

The UnUsual Suspect · June 29, 2006

In theory, this idea is good, but bit torrent doesn't look at file names to determine if the file is correct, like your program does.

It would have to run a hash check on the file with the same name, and it would have to match exactly.

Two files can have the same name, but be different.

Imagine a letter named letter.doc

A second letter named letter.doc, but the second had a ps. at the bottom, so it would fail the hash check.

To incorporate this into a bit torrent client, it would have to find the file with the same name, then hash check that file, which would be a drain on system resources.

Plus, Imagine all the "readme.txt" files you have on your hard drive :P

Now, with this said, I am not a bit comet developer, and I'm sure the developers will consider your request, and perhaps there is something they can do.

Suspect

aimee · June 29, 2006

In fact, BitComet has this feature. If you download a torrent with the same features as the one existed, you will get a alert. The "same features" means the same name, the same files and so on.

Const2k · June 29, 2006

The file can be identified by: filename, filesize, its URL, checksum(-s), probably with some other ways I don't know of. And I don't know how FlashGet identifies two files to be the same (except for checking file's URL and logged one). .torrents are identified by their hashes (e.g. googling for [hash_was_skipped_by_some_filter_I_didn_know_of] will give you pages this torrent is listed on). So, instead of using unreliable filename and unusable URL & checksum BitComet could

use filesize check (database for this would become quite large, but that would give more probability to skip already downloaded file) on each file within .torrent (and this can cause slowdowns with, say, 2k files within .torrent) or

use hash of .torrent (like search engines above) to check whether it was downloaded already. In this case, database will be small and quick to search in. Obviously, duplicate files within different torrents won't be found.

Moreover, torrent file format as it is now doesn't allow storing of each file's checksums, just their sizes (torrent's pieces' hashes are stored instead). So fully implementing this feature would require revision of BitTorrent protocol in a whole (IMO; maybe these checksums could be stored after all .torrent info so older client would not "keep attention" on them). Though I'd personally like this feature alot :)

Another way is to include checksums in filenames (like all these [A1B2C3D4] stuff in some video's filenames). On another hand, this would be pain-in-the-a** for writing them to CD/DVD, which specifications (namely, Joliet) are quite strict for filenames and path length...

Sometimes asking simple question makes other people think a lot on answering (don't think I've answered it) :)

Const2k · June 30, 2006

Answer on post by izomiac located here:

You tell BitComet (or it determines based on hashes/filesize) which files are identical (in this case 3.avi in A & C and c.avi in B) and it downloads them to a single location without redundantly requesting blocks from both swarms.

To sum up, it's impossible right now to compare files within different .torrent-s as .torrent's structure is based on using pieces's (not files') hashing. In rare cases that might be usable, but several conditions must be met: same piece size & same order of files in all .torrent-s plus they all must be without "Private" flag. Even then (as of now) only consecutive files from the beginning of all .torrent-s could be used.

Example (A,a,1,01 are all the same file; B,b,2,02 is another group of same files etc.):

Torrent #1: A, B, C, D

#2: a, b, c, d, e, f, g.

#3: 04, 05, 06

#4: 5, 6, 7, 8

Only "A,B,C,D" and "a,b,c,d" files could be downloaded (.torrent-s must meet conditions above), not "e,f", "5,6" and "05,06" as they are divided into different pieces (on making .torrent) => these pieces have different hashes => client considers them to be different.

And even such implementation (only theoretical) would be good for situation we have now...

You see, I've tried uploading two different completed torrents with some same files in both of them at the same time. BitComet (after some time) gave me "sharing violation" error with itself, though all files were opened for reading only by BitComet (of course, they all were 100% completed, their torrents stopped, manually rehashed and started over again %) )

WTF? Files opened for "read" can be accessed for reading by any number of any programs!

(try playing some long .mp3 in Winamp, archiving it with WinRAR & copying it to flash drive & to another HDD at the same time. No prob.) So there is a long way to go...

Const2k · June 30, 2006

BTW, I've just found out that there's another idea in BitTorrent WishList that's close to subject:

Title : Multi Hash Info

Submitter : Mirco Romanato

Email : painlord2k@yahoo.it

Category : Feature - Request

Compatibilty : Yes

Thread : -

Description : Over the usual metadata, insert in the .torrent file the metadata about the single files (SHA-1, TigerTree, CMD4, MD5). Would be better if this will be accomplished using magnet links, that can be extended, if needed, without changing the standard.

Pros : Sources of data can be located out from the torrent if the application is enabled (Gnutella/eDonkey200/etc.).

The program could detect if files with the same hashes are present in shared/checked folders. This enable to build "virtual" torrent, where there is not real seeder (one or more of the leechers have only a part of the torrent or the original data will come from outside of the torrent), but leechers will cluster for the same data, will give priority to the torrent data exchange and will be able to build a strong request for the missing data with the external sources (they will ask for the same data in the same time, so there will be a major chance that the data will leack from the sources and will be served to the other leechers) . This is like the "horde" system in Overnet. Trackers are needed only as entry points of the torrent, this will reduce greatly the BW needing.

Cons : The last features is dungerous, because could produce a DDoS versus the data sources, if the applications behaviour is wrong.

I would add that Direct Connect (another P2P network - I have this one in my district's LAN too) clients use database of TTH (Tiger Tree Hash) hashed files. It could be shared with another applications in some way...

Well, looks like identification of whole single files in .torrent is highly demanded and very useful innovation with great future to be implemented :)

Sign In

my suggestion - avoid re-download the same file

Recommended Posts

youseemi

Link to comment

Share on other sites

The UnUsual Suspect

Link to comment

Share on other sites

aimee

Link to comment

Share on other sites

Const2k

Link to comment

Share on other sites

Const2k

Link to comment

Share on other sites

Const2k

Link to comment

Share on other sites

Please sign in to comment

Browse

Activity