safebooru 2021.03 rip + addons from yande-re, gelbooru, chan-sankakucomplex, zerochan etc

Category:

Pictures - Graphics

Date:

2021-04-23 08:26

Submitter:

AlexPUA

Seeders:

Information:

No information.

Leechers:

File size:

199.9 GiB

Completed:

Info hash:

bedfcde62aff2bb3f1207ca35b3a2fccc3f7e1b4

Download Torrent or Magnet

Another volume V2021A for interval 12.2020-03.2021 in series of composite safebooru-based rips
09.2020 - 12.2020 volume V2020D - it’s boring description mostly replicated here
06.2020 - 09.2020 volume V2020C
02.2020 - 05.2020 volume V2020B
08.2019 - 01.2020 volume V2020A
11.2018 - 08.2019 volume V2019
and some earlier releases of BOORU-CHARS OPEN DATASET

This rips not intended to be “complete and maximum quality” but rather “representative the best of” to help users
not to loose interesting fandom, artist or even single prominent picture and get all stuff with several clicks

Sources used (priorities high to low when deduplicating):

safebooru.org (ID 33xxxxx) letter S in archive/folder name
yande.re (with some questionable images in separate Q-folders) letter Y
gelbooru.com (a little bit NSFW in Q) G
anime-pictures.net A
konachan.com (with Q) K
zerochan.net Z
chan.sankakucomplex.com (with Q) C
e-shuushuu.net E

134.773 images sorted and zipped according aspect ratio (dimensions 2 folders) priorities high to low :

44.236 “artbook pages” 7x10 (+/- 4%)
19.862 “wide pages” 3x4 (+/- 10%)
24.276 “squares” 1x1 (+/- 20%)
28.426 “wallpapers and computer screens” 3x2 (+/- 40%)
17.973 “high pages” 2x3 (+/- 40%) folder name contains 1x2

and also for source and (sometimes) ID range, mentioned in folder/archive name.
You can browse pictures directly in archives with FastStone MaxView of something like it.

File names structure : %website% - %id% - %up_to_3_copyrights% ~ %up_to_5_characters% (%up_to_2_artists%).%ext% where

%copyright% , %character% and %artist% may be used as filter for search on source booru
%website% + %id% is unique and also may be used to get direct booru url

so you can extract subsets of interest with xcopy (from already unzipped images) or unzipping (from release on the fly) e.g.

for %%F in ("d:\Safebooru 2021a\*.zip") do 7z x -r -o"e:\sortarea\" "%%F" *sword*art*online*
xcopy /s d:\Safebooru 2021a\*sword*art*online* e:\sortarea

Transformations and filters:

initially filtered Mpixels >= 1.2, width >= 900, height >= 900
PNG converted to JPG (quality 94%), no animations
downsize to 60MPix and/or maxsize 9000 px, stripes dropped or adjusted to aspect ratio 0.4 … 2.1
manually (yep, plenty of ~~hand~~job behind this release)
- comic and 4koma, segmented scans and overtexted covers filtered out
- real-life photos, no-character landscapes, most of line-arts and primitive chibi thrown away
- too explicit images (undercensored nipples or vulva, obvious hints on adult actions etc) excluded from “questionable” downloads
- crops done (sometimes as per-frame splitting) when large simple or dirty background, most artbooks de-bordered
- occationally gamma correction, denoise and other nontrivial improvements made
carefully deduplicatied (with AntiDupl NET up to 4% similarity) along with past releases

Some meta-information included in tab delimited files :

V2021A_files.TSV post info (size, resolution, MD5 etc) with concatenated copyrights / characters / artists tags (Excel capable)
V2021A_tags.TSV all tags (incl. general and meta) one tag per line (2.789.454 rows, not fit into Excel)

Using some database you can play with SQL and xcopy (from already unzipped images, copypasting query result) anything you want, e.g.

select 'xcopy "d:\'||torr_path||'\'||file_name||'" e:\sortarea ' xc
from files f
join tags t on t.booru=f.booru and t.fid=f.fid
where t.tag='cameltoe' -- do not expect too much even with this tag

File list

Safebooru 2021a
- 1x1.a.zip (416.2 MiB)
- 1x1.c.q.zip (528.9 MiB)
- 1x1.c.zip (1.5 GiB)
- 1x1.e.zip (44.7 MiB)
- 1x1.g.q.zip (943.6 MiB)
- 1x1.g.zip (2.6 GiB)
- 1x1.k.zip (56.2 MiB)
- 1x1.s.330.zip (1.9 GiB)
- 1x1.s.332.zip (2.0 GiB)
- 1x1.s.334.zip (1.9 GiB)
- 1x1.s.336.zip (2.2 GiB)
- 1x1.s.338.zip (2.0 GiB)
- 1x1.y.q.zip (1.3 GiB)
- 1x1.y.zip (788.7 MiB)
- 1x1.z.316.zip (2.5 GiB)
- 1x1.z.319.zip (1.8 GiB)
- 1x1.z.321.zip (1.6 GiB)
- 1x1.z.323.zip (2.3 GiB)
- 1x2.a.zip (901.4 MiB)
- 1x2.c.q.zip (942.7 MiB)
- 1x2.c.zip (1002.8 MiB)
- 1x2.e.zip (42.3 MiB)
- 1x2.g.q.zip (995.8 MiB)
- 1x2.g.zip (2.1 GiB)
- 1x2.s.330.zip (2.0 GiB)
- 1x2.s.332.zip (2.2 GiB)
- 1x2.s.334.zip (2.4 GiB)
- 1x2.s.336.zip (2.6 GiB)
- 1x2.s.338.zip (2.7 GiB)
- 1x2.y.q.zip (3.2 GiB)
- 1x2.y.zip (1.1 GiB)
- 1x2.z.31.zip (2.4 GiB)
- 1x2.z.32.zip (3.4 GiB)
- 3x2.a.zip (1.4 GiB)
- 3x2.c.q.zip (913.9 MiB)
- 3x2.c.zip (1.8 GiB)
- 3x2.e.zip (123.2 MiB)
- 3x2.g.q.zip (1.2 GiB)
- 3x2.g.zip (3.1 GiB)
- 3x2.k.zip (463.2 MiB)
- 3x2.s.330.zip (3.1 GiB)
- 3x2.s.332.zip (3.2 GiB)
- 3x2.s.334.zip (3.3 GiB)
- 3x2.s.336.zip (3.4 GiB)
- 3x2.s.338.zip (3.7 GiB)
- 3x2.y.q.zip (3.4 GiB)
- 3x2.y.zip (2.7 GiB)
- 3x2.z.316.zip (2.5 GiB)
- 3x2.z.318.zip (2.9 GiB)
- 3x2.z.320.zip (3.3 GiB)
- 3x2.z.322.zip (2.1 GiB)
- 3x2.z.323.zip (3.6 GiB)
- 3x4.a.zip (511.7 MiB)
- 3x4.c.q.zip (728.6 MiB)
- 3x4.c.zip (1.5 GiB)
- 3x4.e.zip (71.9 MiB)
- 3x4.g.q.zip (1.1 GiB)
- 3x4.g.zip (2.5 GiB)
- 3x4.s.330.zip (1.7 GiB)
- 3x4.s.332.zip (1.8 GiB)
- 3x4.s.334.zip (1.7 GiB)
- 3x4.s.336.zip (1.9 GiB)
- 3x4.s.338.zip (2.2 GiB)
- 3x4.y.q.zip (1.6 GiB)
- 3x4.y.zip (1.0 GiB)
- 3x4.z.316.zip (2.8 GiB)
- 3x4.z.320.zip (2.2 GiB)
- 3x4.z.322.zip (1.8 GiB)
- 7x10.a.zip (2.2 GiB)
- 7x10.c.q.zip (1.7 GiB)
- 7x10.c.zip (2.8 GiB)
- 7x10.e.zip (348.0 MiB)
- 7x10.g.575.zip (3.1 GiB)
- 7x10.g.586.zip (3.3 GiB)
- 7x10.g.q.zip (2.4 GiB)
- 7x10.s.330.zip (2.7 GiB)
- 7x10.s.331.zip (2.9 GiB)
- 7x10.s.332.zip (2.5 GiB)
- 7x10.s.333.zip (2.8 GiB)
- 7x10.s.334.zip (2.6 GiB)
- 7x10.s.335.zip (3.0 GiB)
- 7x10.s.336.zip (2.6 GiB)
- 7x10.s.337.zip (3.1 GiB)
- 7x10.s.338.zip (3.0 GiB)
- 7x10.s.339.zip (2.7 GiB)
- 7x10.y.q.72.zip (2.2 GiB)
- 7x10.y.q.73.zip (2.8 GiB)
- 7x10.y.q.74.zip (2.7 GiB)
- 7x10.y.q.75.zip (1.5 GiB)
- 7x10.y.zip (3.7 GiB)
- 7x10.z.316.zip (1.6 GiB)
- 7x10.z.317.zip (1.9 GiB)
- 7x10.z.318.zip (2.0 GiB)
- 7x10.z.319.zip (2.2 GiB)
- 7x10.z.320.zip (1.7 GiB)
- 7x10.z.321.zip (1.9 GiB)
- 7x10.z.322.zip (2.0 GiB)
- 7x10.z.323.zip (1.9 GiB)
- 7x10.z.324.zip (2.2 GiB)
- V2021A_files.tsv (34.4 MiB)
- V2021A_tags.tsv (138.7 MiB)

Comments - 4

sebern

4 years 2 months 7 hours ago

Thank you for doing this ! Shall I know how you’re collecting all of these ?

AlexPUA (uploader)

4 years 1 month 4 weeks ago (edited)

When grabbing I use “bionus imgbrd grabber” for most sources (safebooru, yande-re, gelbooru, anime-pictures, konachan) with rich filenaming and verbose metadata logging.
For some sources I use hand-made (very simple - you can find it in my zerochan and e-shuushuu releases) python scripts also with lot of supporting info collectioning.
Grabbing sankaku was always a specific challenge. At the moment I lost.
Then I load logs into SQL database and use it (tags etc) for batch image conversion, file renaming and so on. It’s sophisticated workflow but it not requires much time when scripted.
A lot of time takes visual control, semi-manual cleanup and deduplicating (internal and along with past releases).

sebern

4 years 1 month 3 weeks ago

Thank you ! I’ll try later and contact you if I have some problem ^^

SomaHeir

4 years 1 month 6 days ago

Thanks!!