For some time now, I've been anticipating the conversion of my megabrain and several secondary brains to TB9 and, for those who have already navigated this transition from TB8 to TB9, I have a question that has recently surfaced in several recent forum discussions. See examples here and here.

How do you recommend scanning a recently imported (converted) megabrain to determine the presence and extent of any data loss and/or corruption?

I can imagine any variety of scenarios where links, images, note text, attachments, thought types, pins, tags, etc. could be changed or lost, and I imagine the task of trying to identify if/where these problems might have occurred in any brain (especially a mega brain) could be quite daunting.

In light of this, I would welcome any hints/tips, report recommendations and/or suggestions for how best to evaluate the integrity of your data after migrating a large brain into TB9.

I'm listening.
Well, for one, TB8 has a statistics function. Maybe if the identical statistics function can be inserted into TB9, that would be the first point of identification. It's already pretty comprehensive in TB8 and perfect for comparison between TB8 and TB9 conversions (

Sounds like a very good starting point for comparison, Jim. I've listed this as my first step for checking the data.

Thanks for the suggestion.
Jim, I just now compared the stats screens in TB8 & TB9 for 2 completely different brains, and they report almost the same data, but not quite. For example, events are not (yet) listed in the stats screen in TB9.

There may be other differences, as well. I just didn't take the time yet to do a line-by-line comparison.



I think I remember seeing either in this forum or the Preview forum that TheBrain 8 statistics were calculated differently, due to the database changes they may not match exactly. The statistics in v. 9 are supposed to be more accurate. Just something to be aware of. I think if you search the forum you'll probably find it.

Other Ideas:
Run reports and compare v8 to v9 (I would compare Typed thought counts and tagged thought counts). Do the same for Attachments (internal, link and URL). You can do this in TB9 and I believe TB8 also. 

Thanks, ZenRain, I was planning on doing many of the reports you recommended.

However, I was completely unaware that stats are tracked differently in V8 and in V9, and I very much appreciate your heads up on this.

Based on the feedback in this forum thread, it looks like we'd be comparing apples to oranges in several different regards if we try to use the basic statistical data. In fact, Patrick even said: "The way these statistics are generated in TheBrain 8 and TheBrain 9 are completely different so most things will most likely not report the same. There is no easy way to compare."

In light of this, I'm wondering now:
> how TB staff recommends we check brains newly imported into TB9 for potential data loss, and
> which reports can be trusted to provide an accurate basis for comparison between V8 and V9?
I was going to chime in about the difference in how V9 calculates stats, but it looks like you found the post (that explains better than I would have). In that post Harlan says "Links in TB9 are stored using a single row per link while TB8 used a double row per link scheme"
This was explained to me as "V8 looks at links from both ends, so if a Thought is deleted, TheBrain may still see the dangling link on the Thought that it was previously connected to."

Cheers, Sean
Thank you, Sean, for sharing this comment from Harlan about link "rows". When I read this comment initially, I had no clue as to what he was referring, but your explanation proved to be very helpful. Thanks.

Thank you also for the additional feedback you shared here about:
> Alex's recommended reports (see above) and
> the new built-in alert for attachment file pathways that are too long.

In the context of discussing Alex's recommended reports, I am wondering:
(1) Are events treated identically (as thoughts?) in TB8 & TB9?
-- If not, how can we compare typed and tagged thought counts between the 2 versions?

(2) Since notes are treated as a special kind of file attachment in the TB8, but not in TB9, how could any attachment count comparison between TB8 & TB9 be meaningful?

(3) Also, since links are handled differently in each version of TB, I don't see how link reports would provide a meaningful comparison either -- or am I missing something here?
Hi metta,

This is an idea that you can use to check for missing files, because thebrain uses the file system to store files, we can use the help of a duplicate checker program to check for duplicates. if we found duplicates of the file on the other brain, then it means its imported.

Here is what you need.
1. A duplicate checker program ( I use Duplicate File Detective, but other duplicate check can be used)
2. 2 copies of thebrain 8 data folder (1 copy for backup)
3. 2 copies of thebrain 9 data folder (1 copy for backup)

Here is what you do
1. Secure the backups first in case something unexpected happen
2. Make sure the backups are secured.
3. Add additional backups if required.

Here is what you do after backups ( I will use feature in Duplicate File Detective as example)
1. Ready the brain 8 and brain 9 data folders
2. Open duplicate checker program
3. Add the brain 8 and the brain 9 data paths
3.a Lock thebrain 9 path, so the program cannot delete any files on thebrain 9 data path
4. Set program to compare using SHA-1 hashing, you can also use byte-by-byte but that will take some time 
5. Search for duplicates
6. After completing the search, select all files in thebrain 8 data folder, you can use smart mark, mark by path, select thebrain 8 path
7. Delete mark files, you can also use Move files .
8. Once the files are out of thebrain 8 data folder, you can check thebrain 8 data folder properties if there are files left, if the properties show some bytes, you can use "*" or "*.*" to show all files on the folder.
9. The remaining files are files that are not imported into thebrain 9. These are the files that you need to manually check in thebrain 8, I think by using the GUID.

The downside is the duplicate program can only check missing files attachments, for other features accessible only inside the brain example are notes, thought type, links, you can check the log file for any errors. 

Hope this helps!
Thank you, galactic_star!

I very much appreciate your detailed suggestion for identifying missing attachments, with an appropriate emphasis on creating a good set of back-ups before beginning. ;-)

I will definitely keep this option in mind.

In the meantime, with a program like Duplicate File Detective, are there any limitations on the types of "files" it identifies?

For example, in my megabrain, the vast majority of attachments are actually links:
  • Thoughts: 29,147
  • Internal File Attachments: 4,634
  • External File Attachments: 155
  • URL Attachments 24,931

In this context, will these URL shortcuts be identified as "files" by a program like Duplicate file Detective? Or, will this kind of program only recognize my brain's internal file attachments?

As for checking my log files for additional internal errors, I may have to rely on support from TB staff on this since I don't have much experience (yet) reading and deciphering log files.

Finally, have you had any experience importing a large brain into TB9? If so, how did the process go for you?
I learned that backup is very important as I have lost files before.

Quote: In the meantime, with a program like Duplicate File Detective, are there any limitations on the types of "files" it identifies?

I don't think any duplicate checker program has any limitation on file types, the method I use in duplicate checking is by comparing hash, you can test this by copying a file and removing/changing the file extension of the copy, the two files still have the same hash. 

I'm not sure if the URL attachments are saved as file? can you see it using windows explorer? if not, then you cannot check these using a duplicate checker program.

These program can only see the internal file attachments, as long as windows explorer can see the files, you can use these program to check.

I don't have any experience importing into TB9, but the process using duplicate checking, I used while transferring ebooks into calibre library, I compare the calibre library folder for any duplicates before importing books 

Thanks, galactic_star. Appreciate the additional feedback. :-)

Going back to your Statistics from v8 vs. v9.  Please be assured that your v9 Brain Statistics are accurate, where as the v8 statistics could potentially be 'off'  The v8 database would sometimes delete a Thought (when requested by the user), but not all the links that were associated with it. It caused no harm in the data - but the statistics were then not 100% reliable in v8.  File links and attachments were sometimes effected in much the same way as well.  Just an example of not putting too much weight on compairing your v8 vs. v9 statistics screens.

Thanks for confirming, Matt. From the feedback others have shared here, I had gathered a statistical comparison between brains might not provide the necessary information.

In light of this, what do you and TB staff recommend as a strategy for determining potential data loss (thought links, notes, types, tags, pins and attachments) after a megabrain import into V9?

I imagine that with all but the smallest of brains, it may be exceedingly difficult to determine if and where key data might be missing.

Any suggestions or recommendations?

