Ross Finlayson
2024-03-09 18:01:52 UTC
Hello. I'd like to start with saying thanks
to Usenet administrators and originators,
Usenet has a lot of perceived value as a cultural
artifact, and also a great experiment in free
speech, association, and press.
Here I'm mostly interested in text Usenet,
not binaries, that text Usenet is a great
artifact and experiment in speech, association,
and press.
When I saw this example that may have a
lot of old Usenet, then it sort of aligned
with an idea that started as an idea of
vanity press, about an archive of a group.
Now though, I wonder how to define an
"archive any and all text usenet", AAATU,
filesystem convention, as a sort of "Library
Filesystem Format", LFF.
The idea is that each "message", "post", has an ID,
then as far as that's good, that each group
in the hierarchy has a name, and that, each
message has a date. Then, the idea is to
make an LFF, that makes a folder for a group,
for a date, each its messages.
a.b.c/YYYY/MMDD/HHMM/
The idea is to make a dump or repo and
make it so that it can be filled, backfilled
as it were, with all text usenet, that it results
that each group's YYYY/MMDD, directory,
has up to 3600 HHMM partitions, directories,
then that in the course of backfilling, it's
determined the "corresponding" date, of
a message, a post, and the post goes in
the folder.
There are very useful notions of "mbox" and
"maildir", with the idea that LFF or "maillff",
and mbox and maildir variously have a great
affinity.
While the idea of a partition gives a place,
in context and in time, each message, then
each kind of filesystem has limits, on the
model of a root directory and its entries,
files and directories, entries, for a volume of files
in a store, and, a tape-archive of those files.
There are limits on filenames.
length
depth
character-set
character-encoding
There are limits on each directory contents.
count-all
count-dirs
count-files
There are limits on the volume.
count
size
overhead
There are features of filesystems that would
be very useful but not be expected to be usual.
sym-link
modification-time
directory-order
It would be possible to indicate those
in the names, under the limits.
So, the idea or goal, is how to arrive at,
using the filesystem to organize posts,
a filesystem convention for a group/year/day,
to build it out so that it results a rather large
store, of all these things.
Then, it's reaching limits or "no, the
filesystem cannot be more than 7 deep",
about how to store unboundedly many
messages.
(It's figured to store messages at rest,
after validating well-formedness,
compressed.)
Then the idea would be to make a sort
of way for people with either mbox files,
or running an NNTP server, to arrange
for uploading the mbox files, or provisioning
a "slurp" or "suck" feed login, with the idea
to start collecting a matrix of groups, on
a store with the idea of providing unlimited
and unbounded retention with redundant
stores, what results a sort of "decentralized
culture archive" toward "Archive Any And All
Text Usenet".
Then it's wondered that any group's year's worth
of posts would fit in a less than 2GB file, then
that any number of those could just be unpacked
on a system up to its limits, for archival and
"Digital Preservation" purposes.
(The Library of Congress Digital Preservation project
has some entries for mbox and so on, with regards
to here a sort of "most usually fungible Library
Filesystem Format LFF", looking for something
like that.)
So thanks again to Usenet administrators and
originators, there's a real perceived value in
making a project to slurp it all together, or, at
least, for any given group over any given time,
with an organization that represents partitions
into group G and date as G/YYYY/MMDD.
Similar-seeming recent threads:
archived articles available at usenetarchives.com
Google Groups no longer supports new Usenet posts or subscriptions.
Historical content remains available
If you can point me to similar interests or efforts
with regards to digital preservation, I'd be
interested your comments or details here.
to Usenet administrators and originators,
Usenet has a lot of perceived value as a cultural
artifact, and also a great experiment in free
speech, association, and press.
Here I'm mostly interested in text Usenet,
not binaries, that text Usenet is a great
artifact and experiment in speech, association,
and press.
When I saw this example that may have a
lot of old Usenet, then it sort of aligned
with an idea that started as an idea of
vanity press, about an archive of a group.
Now though, I wonder how to define an
"archive any and all text usenet", AAATU,
filesystem convention, as a sort of "Library
Filesystem Format", LFF.
The idea is that each "message", "post", has an ID,
then as far as that's good, that each group
in the hierarchy has a name, and that, each
message has a date. Then, the idea is to
make an LFF, that makes a folder for a group,
for a date, each its messages.
a.b.c/YYYY/MMDD/HHMM/
The idea is to make a dump or repo and
make it so that it can be filled, backfilled
as it were, with all text usenet, that it results
that each group's YYYY/MMDD, directory,
has up to 3600 HHMM partitions, directories,
then that in the course of backfilling, it's
determined the "corresponding" date, of
a message, a post, and the post goes in
the folder.
There are very useful notions of "mbox" and
"maildir", with the idea that LFF or "maillff",
and mbox and maildir variously have a great
affinity.
While the idea of a partition gives a place,
in context and in time, each message, then
each kind of filesystem has limits, on the
model of a root directory and its entries,
files and directories, entries, for a volume of files
in a store, and, a tape-archive of those files.
There are limits on filenames.
length
depth
character-set
character-encoding
There are limits on each directory contents.
count-all
count-dirs
count-files
There are limits on the volume.
count
size
overhead
There are features of filesystems that would
be very useful but not be expected to be usual.
sym-link
modification-time
directory-order
It would be possible to indicate those
in the names, under the limits.
So, the idea or goal, is how to arrive at,
using the filesystem to organize posts,
a filesystem convention for a group/year/day,
to build it out so that it results a rather large
store, of all these things.
Then, it's reaching limits or "no, the
filesystem cannot be more than 7 deep",
about how to store unboundedly many
messages.
(It's figured to store messages at rest,
after validating well-formedness,
compressed.)
Then the idea would be to make a sort
of way for people with either mbox files,
or running an NNTP server, to arrange
for uploading the mbox files, or provisioning
a "slurp" or "suck" feed login, with the idea
to start collecting a matrix of groups, on
a store with the idea of providing unlimited
and unbounded retention with redundant
stores, what results a sort of "decentralized
culture archive" toward "Archive Any And All
Text Usenet".
Then it's wondered that any group's year's worth
of posts would fit in a less than 2GB file, then
that any number of those could just be unpacked
on a system up to its limits, for archival and
"Digital Preservation" purposes.
(The Library of Congress Digital Preservation project
has some entries for mbox and so on, with regards
to here a sort of "most usually fungible Library
Filesystem Format LFF", looking for something
like that.)
So thanks again to Usenet administrators and
originators, there's a real perceived value in
making a project to slurp it all together, or, at
least, for any given group over any given time,
with an organization that represents partitions
into group G and date as G/YYYY/MMDD.
Similar-seeming recent threads:
archived articles available at usenetarchives.com
Google Groups no longer supports new Usenet posts or subscriptions.
Historical content remains available
If you can point me to similar interests or efforts
with regards to digital preservation, I'd be
interested your comments or details here.