Mark Epstein, brother of the late sex offender Jeffrey Epstein, alleged in an interview with NewsNation that the Epstein files are being edited to remove Republican names.
I’ve been running OCR on the images folder of the files since last week and just reached out to the creator to see if they want the data I’ve processed. Right now that entire graph is ONLY the “text” portion of the dump. There are 26k images, which are mostly pictures of emails and other documents. I’m like 80% through processing them (although I’ve had some hiccups in the past 24 hours).
Whoa, I hope they’re interested. I didn’t realize the pics info wasn’t included. Thanks for doing all that work. I looked through some of it and there’s a ton there.
Yeah its ridiculous how much is in there. I’m pulling their current repo to see how they are building their DB so if they don’t get back to me, I can at least combine the two databases.
And if any one reading this wants a copy of what I’ve processed so far, I’m more than happy to share.
But it looks to me like they dropped a couple hundred on just processing those text files. It would be north of 2.5k additional to process the data I’m creating.
That being said, mine only goes as far as extracting the contents and creating a sha256 hash to keep track of the documents themselves/ document tampering. It doesn’t take the next step to extract names, locations, dates, etc…
I’m working that out now but it seems like the way to do this would be so it fits into their DB seamlessly.
Maybe he should release the originals?
He did.
https://oversight.house.gov/release/oversight-committee-releases-additional-epstein-estate-documents/
Helpful tool to research the documents:
https://epstein-doc-explorer-1.onrender.com/
That’s not all of it.
Thanks for sharing this.
I’ve been running OCR on the images folder of the files since last week and just reached out to the creator to see if they want the data I’ve processed. Right now that entire graph is ONLY the “text” portion of the dump. There are 26k images, which are mostly pictures of emails and other documents. I’m like 80% through processing them (although I’ve had some hiccups in the past 24 hours).
https://codeberg.org/sillyhonu/Image_OCR_Processing_Epstein
Whoa, I hope they’re interested. I didn’t realize the pics info wasn’t included. Thanks for doing all that work. I looked through some of it and there’s a ton there.
Yeah its ridiculous how much is in there. I’m pulling their current repo to see how they are building their DB so if they don’t get back to me, I can at least combine the two databases.
And if any one reading this wants a copy of what I’ve processed so far, I’m more than happy to share.
But it looks to me like they dropped a couple hundred on just processing those text files. It would be north of 2.5k additional to process the data I’m creating.
That being said, mine only goes as far as extracting the contents and creating a sha256 hash to keep track of the documents themselves/ document tampering. It doesn’t take the next step to extract names, locations, dates, etc…
I’m working that out now but it seems like the way to do this would be so it fits into their DB seamlessly.