reddit-lemmy-importer/README.md

# reddit-lemmy-importer
turn json files downloaded from https://the-eye.eu/redarcs/ into lemmy comms :D

this is effectively https://github.com/mesmere/RedditLemmyImporter but in js and for a different type of archive

the posts/comments dump is read as a stream so handling bigger subreddits is less ram-intensive (though the final tree will still take up a good amount of ram so maybe create a big swapfile if processing large subreddits)

**You must create the community and user in lemmy before you run the SQL script, since the script grabs the corresponding IDs based off the names you give for the two.**

You can build the SQL script before making the comm/user though.

## usage:
install dependencies

`yarn install`

run the program

`yarn run importer --posts example-submissions.json --comments example-comments.json -c example_archive  -u archive_bot -o example.sql`

option 1: import your sql remotely

`psql --username=lemmy --dbname=lemmy --port=[lemmy-port]  --host=[ip/host] --file=example.sql`

option 2: import your sql on the server

`psql --username=lemmy --dbname=lemmy --file=example.sql`

option 3: import your sql to a docker container

`<example.sql docker exec -i [container-id] psql --username=lemmy --dbname=lemmy -`

## TODO:
- set URL embed titles/descriptions and url_content type and embed_video_url in posts

- FIX ap_id!!!!!

- - this could be done by taking the federated url as an argument then updating the ap_id using [the url + /type/ + sql id from the post]

- do removal by self/mod in posts and comments properly

- maybe modify the downvotes in comment_aggregates when the score is negative (this depends if that column exists in production hexbear (i'm on some strange branch idk))

- - since right now it just changes the upvotes to be negative or whatever the score is

- Remove the json fields from comments and posts that don't get used for importing, so that the final array of trees of posts/nested comments takes up less memory/space. ✔

- Save the final json to a file and read it through StreamArray, so that the memory is freed once the archive has finished processing ✔

## references

https://github.com/mesmere/RedditLemmyImporter (basically stole the sql stuff from there)

https://www.w3schools.com/sql/

https://linux.die.net/man/1/psql

me kinda just messing with test posts to see what they look like in the db when you make a real post

https://github.com/hexbear-collective/lemmy/tree/hexbear-0.19.5

https://github.com/hexbear-collective/lemmy/blob/hexbear-0.19.5/crates/db_schema/src/schema.rs

https://www.geeksforgeeks.org/returning-in-postgresql/

https://tanishiking.github.io/posts/count-unicode-codepoint/

https://exploringjs.com/js/book/ch_unicode.html

https://www.reddit.com/dev/api/
Initial commit 2024-11-23 14:05:08 +13:00			`# reddit-lemmy-importer`
Update README.md 2024-11-23 14:38:23 +13:00			`turn json files downloaded from https://the-eye.eu/redarcs/ into lemmy comms :D`

fuck you 2025-01-18 04:57:56 +13:00			`this is effectively https://github.com/mesmere/RedditLemmyImporter but in js and for a different type of archive`

			`the posts/comments dump is read as a stream so handling bigger subreddits is less ram-intensive (though the final tree will still take up a good amount of ram so maybe create a big swapfile if processing large subreddits)`

			`You must create the community and user in lemmy before you run the SQL script, since the script grabs the corresponding IDs based off the names you give for the two.`

			`You can build the SQL script before making the comm/user though.`

usage guide in readme 2025-01-20 23:53:50 +13:00			`## usage:`
			`install dependencies`

			`yarn install`

			`run the program`

			`yarn run importer --posts example-submissions.json --comments example-comments.json -c example_archive -u archive_bot -o example.sql`

			`option 1: import your sql remotely`

			`psql --username=lemmy --dbname=lemmy --port=[lemmy-port] --host=[ip/host] --file=example.sql`

			`option 2: import your sql on the server`

			`psql --username=lemmy --dbname=lemmy --file=example.sql`

			`option 3: import your sql to a docker container`

			`<example.sql docker exec -i [container-id] psql --username=lemmy --dbname=lemmy -`

fuck you 2025-01-18 04:57:56 +13:00			`## TODO:`
			`- set URL embed titles/descriptions and url_content type and embed_video_url in posts`

			`- FIX ap_id!!!!!`

			`- - this could be done by taking the federated url as an argument then updating the ap_id using [the url + /type/ + sql id from the post]`

			`- do removal by self/mod in posts and comments properly`

			`- maybe modify the downvotes in comment_aggregates when the score is negative (this depends if that column exists in production hexbear (i'm on some strange branch idk))`
handle command line arguments, parse json into hashmaps of 2024-11-23 22:09:50 +13:00
fuck you 2025-01-18 04:57:56 +13:00			`- - since right now it just changes the upvotes to be negative or whatever the score is`
handle command line arguments, parse json into hashmaps of 2024-11-23 22:09:50 +13:00
remove unused json fields when parsing to reduce memory/file size of the final json 2025-01-20 23:32:05 +13:00			`- Remove the json fields from comments and posts that don't get used for importing, so that the final array of trees of posts/nested comments takes up less memory/space. ✔`
add more to the todo list 2025-01-19 08:20:24 +13:00
remove unused json fields when parsing to reduce memory/file size of the final json 2025-01-20 23:32:05 +13:00			`- Save the final json to a file and read it through StreamArray, so that the memory is freed once the archive has finished processing ✔`
add more to the todo list 2025-01-19 08:20:24 +13:00
handle command line arguments, parse json into hashmaps of 2024-11-23 22:09:50 +13:00			`## references`

			`https://github.com/mesmere/RedditLemmyImporter (basically stole the sql stuff from there)`
fuck you 2025-01-18 04:57:56 +13:00
			`https://www.w3schools.com/sql/`

			`https://linux.die.net/man/1/psql`

			`me kinda just messing with test posts to see what they look like in the db when you make a real post`

handle command line arguments, parse json into hashmaps of 2024-11-23 22:09:50 +13:00			`https://github.com/hexbear-collective/lemmy/tree/hexbear-0.19.5`
fuck you 2025-01-18 04:57:56 +13:00
			`https://github.com/hexbear-collective/lemmy/blob/hexbear-0.19.5/crates/db_schema/src/schema.rs`

			`https://www.geeksforgeeks.org/returning-in-postgresql/`

			`https://tanishiking.github.io/posts/count-unicode-codepoint/`

			`https://exploringjs.com/js/book/ch_unicode.html`

add more to the todo list 2025-01-19 08:20:24 +13:00			`https://www.reddit.com/dev/api/`