Author Topic: ShrineMaiden v2 archival organization & status  (Read 785 times)

0 Members and 1 Guest are viewing this topic.

Lebon14

  • You can hear Momiji awoo in the distance
    • Twitch
    • Twitter
    • YouTube
    • Lebon14#1880
  • Gender: Male (He)
ShrineMaiden v2 archival organization & status
« on: February 19, 2020, 02:26:00 AM »
Hi all. I'm sure you guys have taken to your crawlers/scripts/etc to save v2's history. In fact, I'm pretty sure that the public version of the website will be extremely well archived.
However, on the Sea Of Tranquility discord, everybody was everywhere doing what they prefered. So, this tread will serve as to who archived what and hopefully we can either re-integrate the content as part of v3 or as a read only sister forum.

So, let us know what you got and what method you used.

---------------------------------

Lebon14: All public (including links leading to files) using HTTrack, some attachements (using Wget files output and manually downloading them) + As A *blank* Archive (manual Firefox screenshot + Page Save As)
Niektory: All boards/threads accessible to registered users, all forum attachments linked in them (and their original filenames), all on-site images included in them, all avatars, profiles (new!) + some other random on-site files. No offsite files and images. Wget + Python scripts.
nav: All threads in print form + attachments.
asie: A Discord user that joined Sea Of Tranquility that is from an underground website preservation group called "ArchiveTeam". They archived the public version of the website. Will most likely be the version found on archive.org.
« Last Edit: February 19, 2020, 01:53:47 PM by Lebon14 »

Re: ShrineMaiden v2 archival organization & status
« Reply #1 on: February 19, 2020, 03:07:56 AM »
So far I have all boards/threads accessible to registered users, all forum attachments linked in them (and their original filenames), all on-site images included in them, all avatars and some other random on-site files. Got the user profile summaries as well. I didn't get any off-site files/images.

I used Wget for downloading and Python scripts for parsing and generating download lists.

I can parse/convert this to whatever format is desired.

EDIT: Got the user profile summaries as well.

There's also Tom's archive on lunarcast.net.
« Last Edit: February 19, 2020, 01:08:17 PM by Niektory »

Re: ShrineMaiden v2 archival organization & status
« Reply #2 on: February 19, 2020, 03:23:27 AM »
I've actually been trying to figure out how to convert that shrinemaiden.csv file to SQL so it can actually be brought into a DBMS like MySQL/MariaDB or PostgreSQL. I'm currently just testing with Microsoft SQL Server to see if it's feasible and so far, it's coming along just fine! A DBMS is useful because it can more efficiently manage a LOT of data (600MB of records is an absolutely massive CSV file). Also provides search functionality and makes finding stuff a whole lot easier than browsing. It's probably possible to use it to implement searching of an HTML archive.
Cirno the Ice Fairy~

nav

  • only a poor old duck
  • still a poor old duck
    • Twitch
    • YouTube
    • nav#5828
  • Gender: I said poor old duck
Re: ShrineMaiden v2 archival organization & status
« Reply #3 on: February 19, 2020, 07:43:26 AM »
I just have all threads I could access in print form plus attachments. It's an unorganized and imperfect collection, but a rather lean one (approx. 150 megabytes when heavily compressed).

Re: ShrineMaiden v2 archival organization & status
« Reply #4 on: February 20, 2020, 06:11:00 AM »
So far so good on the development front with the CSV file.
Am able to parse through it, get all the posts from a single thread, and even output to HTML using simple python code.


Got a lot more work ahead cuz I still need to import it into a true DBMS, then we can write a front end for the database to do some advanced searching.
Cirno the Ice Fairy~

Alpha272

    • Alpha272#6053
    • Steam
  • Gender: Male
Re: ShrineMaiden v2 archival organization & status
« Reply #5 on: February 20, 2020, 07:36:40 PM »
Oh.. that might be a bit late by now, but I already have the entire CSV file imported into a MySQL database.

It currently is on a MySQL instance on my PC but if someone needs it for prototyping (or just to get the sql file), I can push this database to an Azure Instance.

Re: ShrineMaiden v2 archival organization & status
« Reply #6 on: February 20, 2020, 08:07:46 PM »
Oh.. that might be a bit late by now, but I already have the entire CSV file imported into a MySQL database.

It currently is on a MySQL instance on my PC but if someone needs it for prototyping (or just to get the sql file), I can push this database to an Azure Instance.

Plan is, get the SQL file into a DBMS, and do a few extra things than just import the CSV file. Like essentially I would like to spin up a thing where you'd be able to search for threads, usernames and attachments. Mainly doing this to learn more about databases and application development. Glad you got it working though! The more people who have something working, the better we can preserve everything. It's not too late for anything imo
Cirno the Ice Fairy~

Alpha272

    • Alpha272#6053
    • Steam
  • Gender: Male
Re: ShrineMaiden v2 archival organization & status
« Reply #7 on: February 20, 2020, 09:07:46 PM »
(I am currently in a Discord chat with HTFCirno)

Actually.. when everything works out as expected, we have a DB with the entire Archive of V2 (besides of Attachements, which I have sorted in folders by threadid) and a frontend system which has a solid search functionality and which is able to provide links for the attachments.

Now the problem is.. where do we host all of this? Well... MOTK v3 needs to have a DB anyway so... our current idea is, that we could add the table, which holds the entire V2 to the DBMS which motk uses (provided, that motk uses either mysql or postgres). That would solve the largest problem. The frontend could be a standalone download or be hosted as an subwebsite on motk v3 or an inexpensive hoster or something like that and I could drop the attachments into Azure files or something similar.

For the DB thingy we need an answer from an motk admin. If that isn't possible we might need to upload a sqlite file for download or find an inexpensive DB provider; all providers I know are to expensive for me for long term use.

EDIT:
okay nevermind... im unable to download all the attachments... It seems like my account is to "new" (well.. new is relative, but I wrote to few things), to access some subforums.. and I can't download the attachments from the subforums.. so yeee.. someone else needs to backup the attachments.
« Last Edit: February 20, 2020, 10:20:16 PM by Alpha272 »

Barrakketh

  • You're suddenly Director of Fixing That Shit!
  • Vice President of It's Your Problem Now.
Re: ShrineMaiden v2 archival organization & status
« Reply #8 on: February 21, 2020, 12:30:38 AM »
okay nevermind... im unable to download all the attachments... It seems like my account is to "new" (well.. new is relative, but I wrote to few things), to access some subforums.. and I can't download the attachments from the subforums.. so yeee.. someone else needs to backup the attachments.
I've been working on backing everything up, and various users have also been doing so as well. RPG and MSG have a quite a few attachments in them (progress thus far is nearly 1700 in those two alone since I enabled logging), and the forums slow response time is dragging things out substantially.

Re: ShrineMaiden v2 archival organization & status
« Reply #9 on: February 21, 2020, 01:12:45 AM »
I downloaded them all, 3,962,075,911 bytes in 13,419 files (a lot of them are just thumbnails of other attachments though). I tried to be thorough, parsed both threads and user profiles to find them. A table matching attachment IDs and topics to filenames is attached.
« Last Edit: February 21, 2020, 01:14:55 AM by Niektory »

WindyKitsune

  • Lumin(ifer)ous Touhouism
Re: ShrineMaiden v2 archival organization & status
« Reply #10 on: February 21, 2020, 01:37:51 PM »
Currently I'm dowloading all public (non-registered) posts without attachments by means of a custom Python script.
I estimate I'll have about 700-800 MB of uncompressed html pages after filtering empty pages.

I also have "images" folder with 45 MB of uncompressed images with contains forum user avatars with user id downloaded using wget.

Yeah, also a funny easter egg in attachments (from "edible" folder).

Lebon14

  • You can hear Momiji awoo in the distance
    • Twitch
    • Twitter
    • YouTube
    • Lebon14#1880
  • Gender: Male (He)
Re: ShrineMaiden v2 archival organization & status
« Reply #11 on: February 23, 2020, 12:41:52 AM »
Alright, so, my suggestion was to rebuild v2 on a subdomain from the HTML (posts, profiles, etc), pictures, attachments etc. Not using a database; only from an hierarchy of HTML. It is less likely to break in the long run.
As an additional feature, being logged in here allows you to see logged in v2.

Also, having everything locally on a server would also strengthen the long history of the board although it will take much more local space. I have such backup - but only for the public side.
I also need a place to upload that and MEGA is definitely NOT suitable.

Karisa

  • "Resurrection Butterfly -113% Reflowering-"
  • *
    • Twitch
    • YouTube
    • Karisa#5432
  • Gender: Female
Re: ShrineMaiden v2 archival organization & status
« Reply #12 on: February 26, 2020, 10:08:07 PM »
Looks like the archival turned out to be a moot point, with Seventh Holy Scripture's export of v2 that's now being hosted.

Thanks anyway for all your efforts.

Re: ShrineMaiden v2 archival organization & status
« Reply #13 on: February 26, 2020, 10:30:23 PM »
Is there a way that we could make the "members only" parts accessible to people who never had accounts on the old site? I recall quite a few interesting fanfics that were only posted on this site and who's creators never went anywhere else/just disappeared off the face of the earth. I feel that it would be a crime if they all became exclusive to anyone who was lucky to be one of our members.

Lebon14

  • You can hear Momiji awoo in the distance
    • Twitch
    • Twitter
    • YouTube
    • Lebon14#1880
  • Gender: Male (He)
Re: ShrineMaiden v2 archival organization & status
« Reply #14 on: February 26, 2020, 10:44:33 PM »
Is there a way that we could make the "members only" parts accessible to people who never had accounts on the old site? I recall quite a few interesting fanfics that were only posted on this site and who's creators never went anywhere else/just disappeared off the face of the earth. I feel that it would be a crime if they all became exclusive to anyone who was lucky to be one of our members.

I've raised that point in the Discord as there are quite more boards hidden from view, even from a regular member. However, v3 admins don't want to take decisions that's up to v2 admins to make. Personally, what's currently member only, I'd make available publicly minus Letty Journal. Besides that, except some Moderation boards, I'd make them available to members.

Re: ShrineMaiden v2 archival organization & status
« Reply #15 on: February 27, 2020, 08:36:31 AM »
How about just re-enabling registration there? That would solve the problem while preserving the status quo ante.

Kilgamayan

  • False Administrator
  • *
  • The Real Treasure is You
    • Twitch
    • Twitter
    • YouTube
    • Let's Play Super Marisa World
Re: ShrineMaiden v2 archival organization & status
« Reply #16 on: February 27, 2020, 12:07:12 PM »
Given there are still seven million bot guests targeting the v2 forum at any given moment in time, it is unlikely that registration will be opened again for the foreseeable future. I can look into opening some of the non-guest forums some time this weekend, though.
[22:40:12] <Drake> "guys i donwloaded esod but its not workan"
[22:40:21] <Drake> REPORTED
[22:40:25] <NaturallyOccurringChoja> PROBATED
[22:40:30] <Drake> ORGASM
[22:40:32] <NaturallyOccurringChoja> fire truck YEAH