Discussion:
Archive Any And All Text Usenet
(too old to reply)
Ross Finlayson
2024-03-09 18:01:52 UTC
Permalink
Hello. I'd like to start with saying thanks
to Usenet administrators and originators,
Usenet has a lot of perceived value as a cultural
artifact, and also a great experiment in free
speech, association, and press.

Here I'm mostly interested in text Usenet,
not binaries, that text Usenet is a great
artifact and experiment in speech, association,
and press.

When I saw this example that may have a
lot of old Usenet, then it sort of aligned
with an idea that started as an idea of
vanity press, about an archive of a group.
Now though, I wonder how to define an
"archive any and all text usenet", AAATU,
filesystem convention, as a sort of "Library
Filesystem Format", LFF.

The idea is that each "message", "post", has an ID,
then as far as that's good, that each group
in the hierarchy has a name, and that, each
message has a date. Then, the idea is to
make an LFF, that makes a folder for a group,
for a date, each its messages.

a.b.c/YYYY/MMDD/HHMM/

The idea is to make a dump or repo and
make it so that it can be filled, backfilled
as it were, with all text usenet, that it results
that each group's YYYY/MMDD, directory,
has up to 3600 HHMM partitions, directories,
then that in the course of backfilling, it's
determined the "corresponding" date, of
a message, a post, and the post goes in
the folder.


There are very useful notions of "mbox" and
"maildir", with the idea that LFF or "maillff",
and mbox and maildir variously have a great
affinity.

While the idea of a partition gives a place,
in context and in time, each message, then
each kind of filesystem has limits, on the
model of a root directory and its entries,
files and directories, entries, for a volume of files
in a store, and, a tape-archive of those files.

There are limits on filenames.

length
depth
character-set
character-encoding

There are limits on each directory contents.

count-all
count-dirs
count-files

There are limits on the volume.

count
size
overhead

There are features of filesystems that would
be very useful but not be expected to be usual.

sym-link
modification-time
directory-order

It would be possible to indicate those
in the names, under the limits.

So, the idea or goal, is how to arrive at,
using the filesystem to organize posts,
a filesystem convention for a group/year/day,
to build it out so that it results a rather large
store, of all these things.

Then, it's reaching limits or "no, the
filesystem cannot be more than 7 deep",
about how to store unboundedly many
messages.

(It's figured to store messages at rest,
after validating well-formedness,
compressed.)

Then the idea would be to make a sort
of way for people with either mbox files,
or running an NNTP server, to arrange
for uploading the mbox files, or provisioning
a "slurp" or "suck" feed login, with the idea
to start collecting a matrix of groups, on
a store with the idea of providing unlimited
and unbounded retention with redundant
stores, what results a sort of "decentralized
culture archive" toward "Archive Any And All
Text Usenet".

Then it's wondered that any group's year's worth
of posts would fit in a less than 2GB file, then
that any number of those could just be unpacked
on a system up to its limits, for archival and
"Digital Preservation" purposes.

(The Library of Congress Digital Preservation project
has some entries for mbox and so on, with regards
to here a sort of "most usually fungible Library
Filesystem Format LFF", looking for something
like that.)


So thanks again to Usenet administrators and
originators, there's a real perceived value in
making a project to slurp it all together, or, at
least, for any given group over any given time,
with an organization that represents partitions
into group G and date as G/YYYY/MMDD.


Similar-seeming recent threads:

archived articles available at usenetarchives.com
Google Groups no longer supports new Usenet posts or subscriptions.
Historical content remains available


If you can point me to similar interests or efforts
with regards to digital preservation, I'd be
interested your comments or details here.
Ross Finlayson
2024-03-10 17:42:51 UTC
Permalink
Post by Ross Finlayson
Hello. I'd like to start with saying thanks
to Usenet administrators and originators,
Usenet has a lot of perceived value as a cultural
artifact, and also a great experiment in free
speech, association, and press.
...
Post by Ross Finlayson
The idea is that each "message", "post", has an ID,
then as far as that's good, that each group
in the hierarchy has a name, and that, each
message has a date. Then, the idea is to
make an LFF, that makes a folder for a group,
for a date, each its messages.
a.b.c/YYYY/MMDD/HHMM/
...
Post by Ross Finlayson
If you can point me to similar interests or efforts
with regards to digital preservation, I'd be
interested your comments or details here.
Hello, I've studied this for a while. Over on
sci.math, I've been tapping away on a thread
called "Meta: a usenet server just for sci.math".

There it's sort of detailed the context and the
surrounds, about the specs and usual program models,
and, models of the data.

What I hope to figure out, is this "LFF" or
"Library Filesystem Format", convention, what
results "it's sort of a complete collection of
a groups' dates' posts, that is under 2GB
and fits on on all file-systems if it's
less than a few deep from the root of the
volume".

So, the idea is specifically how to pack away
posts, not so much how to access them at the
runtime, though it's also then quite directly
about how to implement Usenet protocols.


The sort of idea is, like, "either Windows or
Linux, FAT/NTFS or ext2/3/..., character sets
and encodings in the names of the groups and
the message ID's and file-names in the file-systems,
partitioned by group and date, all the groups' date's
posts".

One idea is that "a directory can't have more than
32k sub-directories, and should be quite less, and,
a directory might store files up to 4-billion many,
and, should be less, and, a directory depth should
be less, than, 7 deep".

Then the idea after the a.b.c/YYYY/MMDD/HHMM,
to store message ID's, by taking an MD5 hash
of the message ID, splitting that into four,
then putting message ID's under H1/H2/H3/H4/MessageId/,
then whether to have a directory or a file,
for the message ID. The usual idea is a file,
because, it's just the actual Internet Message
its contents, but there's an idea that it's various
files, or a directory for them.

Then, the issue seems that gets at least 8 deep,
vis-a-vis, that it doesn't have too many sub-directories
or too many files or not-in-range characters while
it does partition each groups' dates' posts and
stores each groups' dates' posts.


Portable filesystem conventions seem the easiest way
to encourage fungible data this way, then whether
or however it's a tape-archive or zip file, that
they can all just get unpacked together and result
a directory with groups' dates' posts all together,
then make a maildir for example representation of
that, like with symlinks or whatever works on the
destination.


So anyways mostly the context behind this is
in "Meta: a usenet server just for sci.math"
over on sci.math, I think about it a lot because
I really think Usenet is a special thing.


"AAAATU: Archive Any And All Text Usenet"
immibis
2024-03-10 19:06:44 UTC
Permalink
[snip] I wonder how to define an
"archive any and all text usenet", AAATU,
filesystem convention, as a sort of "Library
Filesystem Format", LFF.
The idea is that each "message", "post", has an ID,
then as far as that's good, that each group
in the hierarchy has a name, and that, each
message has a date.  Then, the idea is to
make an LFF, that makes a folder for a group,
for a date, each its messages.
a.b.c/YYYY/MMDD/HHMM/
A filesystem is not a good match for all possible problems. Have you
considered an SQL database, which IS a good match for a large number of
problems?
There are very useful notions of "mbox" and
"maildir", with the idea that LFF or "maillff",
and mbox and maildir variously have a great
affinity.
These were a good idea when they were invented. SQL is a good idea now
that it has also been invented. Most implementations do not suffer from
the limitations you talk about below as well as other limitations you
did not talk about below.

Another system you might be interested in is BitTorrent. I believe
Library Genesis (an illegal backup of all published books) uses this for
resilience. They divided the entire library into some number of torrents
and then told people to go and seed the torrents so that if the library
goes away, it can be reconstructed. Yours wouldn't be illegal, of course.
Schlomo Goldberg
2024-10-11 00:03:53 UTC
Permalink
Post by immibis
[snip] I wonder how to define an
"archive any and all text usenet", AAATU,
filesystem convention, as a sort of "Library
Filesystem Format", LFF.
The idea is that each "message", "post", has an ID,
then as far as that's good, that each group
in the hierarchy has a name, and that, each
message has a date.  Then, the idea is to
make an LFF, that makes a folder for a group,
for a date, each its messages.
a.b.c/YYYY/MMDD/HHMM/
A filesystem is not a good match for all possible problems. Have you
considered an SQL database, which IS a good match for a large number
of problems?
Or NoSQL. Actually, if you think about it, NNTP server is kind of a
front for a NoSQL database. You can request a "key" (Message-ID), you
can list "keys" in a "bucket" (newsgroup), you can do some simple search
(XHDR, XPAT), etc.

Stefan Ram
2024-03-10 19:23:19 UTC
Permalink
Post by Ross Finlayson
The idea is that each "message", "post", has an ID,
Special file systems for news storage, such as the
Cyclical News Filesystem (CNFS), have been developed.

But, as mentioned by immibis, SQL databases can be
very efficient today when used by someone with an
education in relational databases.

For example, I have a filesystem here that sometimes
starts to behave strangely or become slow once there
are several 10,000 files in a single directory. Or,
maybe it's just the user interface not the file system.
But you should make some tests to see whether the fs
can actually support your requirements.
Ross Finlayson
2024-03-10 21:42:54 UTC
Permalink
Post by Stefan Ram
Post by Ross Finlayson
The idea is that each "message", "post", has an ID,
Special file systems for news storage, such as the
Cyclical News Filesystem (CNFS), have been developed.
But, as mentioned by immibis, SQL databases can be
very efficient today when used by someone with an
education in relational databases.
For example, I have a filesystem here that sometimes
starts to behave strangely or become slow once there
are several 10,000 files in a single directory. Or,
maybe it's just the user interface not the file system.
But you should make some tests to see whether the fs
can actually support your requirements.
Hey, thanks, it's very practical, and the
idea that a database will make for the
normalization and the maintenance of
indices and implementing its own access
pattern speaks to a really great idea
about in-between, "a file system contents
in a file", like a tape archive or zip file,
with regards to serial access, and random
access, usually with regards to memory-mapping
the file, access patterns, according to organization.


Of course, one might aver that any such organization
this way, of the coordinates of messages, according
to partitions by group and date, and Message-Id,
or for example Content-Id in the world of external
references and Internet Messages, has a sort of
normal form equi-interpretable, what one might
call "the physical interface" and "the logical interface".

The access most usually involves an index, which
according to either a hash-code or a sort,
results binary-tree or phonebook (alphabetical,
lexicographic) lookup. Here the file-system
implements this and the database implements
this, then with regards to usual index files like
"the groups file", "the overview file", and these
kinds of things. The idea is that groups and dates
naturally partition this.


Here the idea for AAAATU is to have a physical form,
that's very fungible. Files are fungible it's as simple
as that. Databases like sqlite exactly sort of define
how the data the datums have access patterns
according to their coordinates, then that a SQL
interpreter and SQL executor, implementing access
patternry, sure is a great thing.

The great thing here is basically for posterity,
this notion of the "digital preservation",
and for curation of a library of AAAATU,
with a goal to fill in all the coordinates,
and be able to reference and access then
according to the partitions of the group
and date, the Message-Id's posts' messages.


The text-based Internet Protocols have a great
affinity toward each other, NNTP and IMAP and
whatever HTTP is resources and SMTP and POP3,
with regards to great conventions like mbox and maildir,
or for example sqlite files or otherwise, "the store",
of the files, vis-a-vis, the ephemeral, or discardable,
the runtime's access patternry's access.


It certainly makes sense for the runtime, to
both have monolithic maintained store, while,
fungible composable much-much-slower file
accesses. This is where the filesystems have
their limits, and, the runtime has limits of
file handles, with regards to the guarantees
of file system or a notion of "atomic rename",
the consistency, the coherency, of the access-patternry,
the data.


One of the main goals here seems "write-once-read-many",
in a world that's muchly "write-once-read-never".
I.e. the goal's archival vis-a-vis convenience, the ephemeral.


What I'd like to think is that these days, that
multiple terabytes of data, is not an outrageous
fortune, about "on-line, warm-line, and cold-line",
"data" and "data lakes" and "data glaciers", these
kinds of ideas, what represent simply enough the
locality of the data, the levels of the tradeoffs of
time vis-a-vis size, here that I don't necessarily
care about when so much as if, as it were.

Then the effort seems that while each message
declares exactly what groups it's in, then with
regards to best-reckoning what date it was,
then as with regards to: X no-archive, control
messages, and cancel messages, supersedes,
and otherwise the semantics of control or
with regards to site policy, that they key idea
is to establish for any post that existed,
and still exists, that it exists at exactly
one date in any number of groups.



So with this in mind, I surely find it agreeable
that a, "database file format", has underneath
it an idea of, "a filesystem representation",
and about making a usual sort of logical interface
and physical interface, what is a convention,
that has a spec, and is fungible, with bog-standard
tools, the most usual built-ins of file semantics.


Then the idea is that anybody who has regular
hierarchical newgroups, funnels those all together
in an archival the archaeological, making sort
of curated collections, for digital preservation,
sorting out when message uniqueness and integrity
is so or not so, for each, from a world of mbox files
which are figured to be linear in time, or maildir,
or otherwise most usually the date attribute,
then that anybody can indicate the range of
those coordinates groups and dates and
thusly is derived a given edition's, world of
the posts', world of the posts.

Then anybody can just use that as data,
while at the same time, of course each post
is fundamentally the poster's, in the public
space, not the public domain.
David Chmelik
2024-03-11 04:12:08 UTC
Permalink
Hello. I'd like to start with saying thanks to Usenet administrators
and originators,
Usenet has a lot of perceived value as a cultural artifact, and also a
great experiment in free speech, association, and press.
Here I'm mostly interested in text Usenet,
not binaries, that text Usenet is a great artifact and experiment in
speech, association,
and press.
When I saw this example that may have a lot of old Usenet, then it sort
of aligned with an idea that started as an idea of vanity press, about
an archive of a group.
Now though, I wonder how to define an "archive any and all text usenet",
AAATU,
filesystem convention, as a sort of "Library Filesystem Format", LFF.
[...]
Sounds good; I'm interested in full archive of text newsgroups I use
(1300+) but don't know free Usenet servers even go back to when I started
(1996, though tried Internet in museum before Eternal September). I'm
aware I could use commercial ones that may, but don't know which nor cost/
space. Is Google Groups the only going back to 1981? I hope other
servers managed to save that before Google disconnected from peers or some
might turn up back to 1979.

Accessing some old binary ones would be nice also, but these days people
use commercial servers for those, which probably didn't save even back to
'90s... an archive of those (even though I'm uninterested in most rather
than a few relating to history of science, some types of art/graphics &
music) would presumably be too large except for data centres.
Ross Finlayson
2024-03-11 05:48:02 UTC
Permalink
Post by David Chmelik
Hello. I'd like to start with saying thanks to Usenet administrators
and originators,
Usenet has a lot of perceived value as a cultural artifact, and also a
great experiment in free speech, association, and press.
Here I'm mostly interested in text Usenet,
not binaries, that text Usenet is a great artifact and experiment in
speech, association,
and press.
When I saw this example that may have a lot of old Usenet, then it sort
of aligned with an idea that started as an idea of vanity press, about
an archive of a group.
Now though, I wonder how to define an "archive any and all text usenet",
AAATU,
filesystem convention, as a sort of "Library Filesystem Format", LFF.
[...]
Sounds good; I'm interested in full archive of text newsgroups I use
(1300+) but don't know free Usenet servers even go back to when I started
(1996, though tried Internet in museum before Eternal September). I'm
aware I could use commercial ones that may, but don't know which nor cost/
space. Is Google Groups the only going back to 1981? I hope other
servers managed to save that before Google disconnected from peers or some
might turn up back to 1979.
Accessing some old binary ones would be nice also, but these days people
use commercial servers for those, which probably didn't save even back to
'90s... an archive of those (even though I'm uninterested in most rather
than a few relating to history of science, some types of art/graphics &
music) would presumably be too large except for data centres.
Hey, thanks for writing.

Estimates and, you know, reliable estimates,
would help a lot to estimate the scope of
the scale of the order of, the things.

For example, in the units of dollars per message
stored for a month , if it's about 25 dollars per
month per million messages, then getting an
estimate on how many millions messages,
has that the original economies of the system,
have since seen the exponential growth in
the availability of storage and exponential
decrease in its cost, more or less, that these
sorts terms are traditionally euphemized,
"napkin-back", "ball-park", out to "wild-ass-guess".

First then is "how many groups are in Big 8"
then "minus how many of those are under
alt.binaries or otherwise effectively binaries",
then along the lines of "how many national,
corp, or institutional groups are in the public
space", to get an idea of the order of groups.

(The order of things is usually enough log 10 or
log 2, or log e, called log, lg, or ln.)

Once upon a time, an outfit called DejaNews
seemed to do Usenet a real solid, favor, and
for quite some years had the best archives,
and served them up. Their value proposition
came across so great that a giant behemoth
bought them up, that apocryphally, the,
"DejaNews CD's", were compact discs, that
had all the contents of DejaNews.

Then, several commercial providers today,
have, Big 8 text, back about 10 years or
more or less. These are just online and can
be slowly and gently and thoroughly suck-fed,
or you know, a leeching action, where the old
ratio of downloads/uploads is called leech,
like "that lurker has an infinite leech ratio",
these kinds of cultural contexts, here point
being that it's the middle ages and the land
before time, that if one could get the DejaNews
CD's, one might think of these as "land before
time", "DejaNews CD's", "middle ages", and
"right about now", basically 1980-sth to date.

We might be able to get from Usenet admin,
something like, "here is the list of groups
and maybe here's all the groups there ever
were", besides locally and site policy and these
kinds of things, these kinds of things.

So, these days, storage, is, available, then
that, basically a volume will store 4BB named
items, MM = millions, BB = billions, and because
Usenet text messages are pretty small or on the
order of linear in 4KB buffers, where compression
results about one less order, the idea is usually
that a computer can mount multiple volumes,
vis-a-vis, whatever it can fit in memory.

One idea while the filesystem value representation
is so great, is that, though it's slow, and, subject
these sorts limits and planning factors, it never
needs to occupy memory pretty much at all,
which helps a lot when the most metered costs,
of the runtime, are, 1) network I/O egress, 2) RAM,
3) CPU 4) object store or various services, or 5) disk.

One thing about this kind of data is that it's
"write-once-read-many" or, you know, "write-
once read-never", that because there are natural
coordinates group and date, once the idea is
that all those have been found, then it can live
in a filesystem of those all packed up as files,
here with the idea that "LFF's only purpose is
to serve as a place to store packed up files,
then you can load them how you want".

Then, the idea is to make a system, where basically
that it has more or less a plan, which is basically
a list of groups, and a matrix, group x date. The
goal is to fill up for each group x date, all its posts,
in the file system, then when it's more or less
reached a consensus, then it's figured they all
have landed there and live there, what results
that basically the LFF has an edition each day
and what's in it is according to that matrix
the counts, and then, its lineage what were
the sources and what was the quality of the data,
then behind that, the data.


Then, for something like "well, we can pretty much
fit 4BB entries on one volume, and can hire any number
of volumes, and they have plenty of space", here is
for the idea that if these are the inputs, count-groups
times all the days the coordinates, group-days, then
<8 @ message-ID the post-depths, that it's heuristic that
post-depths >> group-days, that a usual sort of
volume can store > 4BB/post-depths those.

The usual idea of "object-store" is "hey as long as
you give it a unique name and don't expect to
file-tree-walk the entire store, an object store
will gladly store its path segmented in our binary
tree which results log 2 or better lookup", with
the idea that, that it results that the group-date
coordinates and keyed off the message-Id, will
look up message ID. The idea is that LFF edition
is a list of message ID's for the group-date,
for example for checking they each exist and
checking they're well-formed and validating them.

The date Jan 1 1970 is called "the epoch", and, often
it's so that Internet time date is "since the epoch".
Here this is that Jan 1 2020 = Jan 1 1970 + 18262 days.

So, fifty years of retention, daily, then is that group-days
is about groups * days, though that groups kind of come
and go, and some group-date coordinates of course will
be empty, vis-a-vis the "dense" and the "sparse".


Another thing about data is backing it up, or moving it.
I.e., at the time something like the DejaNews CD's was
a pretty monumental amount of data.


So it was with some great happiness that the other
day it was suggested there's even some of the
"land before time" in great archives, that it was
something like 3 or 4 terabytes, TB, uncompressed,
then with regards to building out estimates, and,
mostly about having a _design_ after a sort of, charter,
of "LFF: library filesystem format conventions
for AAAATU: archive any and all text usenet",
is for that, "it works on any kind of filesystem,
and any old file tools work on it".

If any care to say "hey here's what you should do"
and this kind of thing, I'll thank you, basically that
I wonder about how many groups there are, with
the idea, of, that, my question is whether that
under each given org, like "rec", "soc", "comp",
"sci", "news", "alt minus binaries", ..., then also
the national and corp and institutional, how many
newsgroups are under those, and, also, are
there are any limits of those.

If there was for each group on Usenet, that each
group has a name and it looks like a.b.c, and each
group has a date that it was born or its first post was,
that's basically the origins of the coordinates,
to make estimates for the order of the coordinates,
and, the order of the items, then, order of their sizes.
Ross Finlayson
2024-03-12 17:25:42 UTC
Permalink
Post by Ross Finlayson
Post by David Chmelik
Hello. I'd like to start with saying thanks to Usenet administrators
and originators,
Usenet has a lot of perceived value as a cultural artifact, and also a
great experiment in free speech, association, and press.
Here I'm mostly interested in text Usenet,
not binaries, that text Usenet is a great artifact and experiment in
speech, association,
and press.
When I saw this example that may have a lot of old Usenet, then it sort
of aligned with an idea that started as an idea of vanity press, about
an archive of a group.
Now though, I wonder how to define an "archive any and all text usenet",
AAATU,
filesystem convention, as a sort of "Library Filesystem Format", LFF.
[...]
Sounds good; I'm interested in full archive of text newsgroups I use
(1300+) but don't know free Usenet servers even go back to when I started
(1996, though tried Internet in museum before Eternal September). I'm
aware I could use commercial ones that may, but don't know which nor cost/
space. Is Google Groups the only going back to 1981? I hope other
servers managed to save that before Google disconnected from peers or some
might turn up back to 1979.
Accessing some old binary ones would be nice also, but these days people
use commercial servers for those, which probably didn't save even back to
'90s... an archive of those (even though I'm uninterested in most rather
than a few relating to history of science, some types of art/graphics &
music) would presumably be too large except for data centres.
Hey, thanks for writing.
Estimates and, you know, reliable estimates,
would help a lot to estimate the scope of
the scale of the order of, the things.
It seems perhaps the best way, or simplest way,
to affect a group-date file contain the file entries,
is to take the above and store it in a zip format file.
The zip format file, supports random access to the
files within it, given random access to the zip file,
for example memory-mapping the file, seeking to
the end and seeking back through the entries to
a given path, and accessing that entry with the usual
algorithm of compression named deflate.

The idea then is a "group-date co-ordinate
hour-minute granular message list", figuring
that each message has either a more granular
date in it or has synthesized an estimated date
header, that should fit on any file system, then
for zip files of those, and "virtual filesystem" or
"synthetic filesystem", then for each a.b.c.yyyymmdd.zip
and a.b.c.yyyy.zip the concatenation of those,
figuring group names, or, mailbox names, are
legal filenames, with regards to those being the
most fungible way to result files, that aren't growing
files, that can be validated to have well-formed messages
in the coordinate of the group and date, as an
archival format, and an interchange format,
then for making it so to load and dump these,
into and out of useful and usual backing stores,
either filesystem or DB.


So, what this involves to "specify the LFF",
is for the limits of the filesystem and the
limits of the packaging file or zip file,
that "any and all text Usenet" messages
can be be in files this way, with "reference
routines" and "reference" algorithms, to
result for NNTP, a 100% instructions, that
results downloading the LFF files, and
generating from it groups-files and overview-files
and so on, "write-append-growing" files where
here these are otherwise "write-once-read-many",
files, to accommodate both being an archival form,
with entirely open specification for digital preservation,
and having reference routines into and out of,
the backing stores of usual implementations of
servers.

Is it sort of the same thing with regards to
any old kind of Internet messages besides
as with regards to especially Usenet NNTP
Internet messages? Yeah, sort of.

Here though what I'd hope to find is,
especially, or here are my questions:

1) how many Usenet groups are there?
text-only, Big 8, then national, institutional, corp

2) what's the most messages a group ever had in one day?

3) is there a list of the birth-dates of the groups?

4) about before the great-renaming, can you describe that?



Well thanks for reading, I've been tapping away
at more of this sort of idea on sci.math "Meta:
a usenet server just for sci.math", about, "BFF
backing file format", "SFF summary/search file formats",
and the runtime and protocols, then here the
idea is about "LFF library/lifetime file formats".
immibis
2024-03-13 02:57:24 UTC
Permalink
Post by David Chmelik
Hello. I'd like to start with saying thanks to Usenet administrators
and originators,
Usenet has a lot of perceived value as a cultural artifact, and also a
great experiment in free speech, association, and press.
Here I'm mostly interested in text Usenet,
not binaries, that text Usenet is a great artifact and experiment in
speech, association,
and press.
When I saw this example that may have a lot of old Usenet, then it sort
of aligned with an idea that started as an idea of vanity press, about
an archive of a group.
Now though, I wonder how to define an "archive any and all text usenet",
AAATU,
filesystem convention, as a sort of "Library Filesystem Format", LFF.
[...]
Sounds good; I'm interested in full archive of text newsgroups I use
(1300+) but don't know free Usenet servers even go back to when I started
(1996, though tried Internet in museum before Eternal September). I'm
aware I could use commercial ones that may, but don't know which nor cost/
space. Is Google Groups the only going back to 1981? I hope other
servers managed to save that before Google disconnected from peers or some
might turn up back to 1979.
Accessing some old binary ones would be nice also, but these days people
use commercial servers for those, which probably didn't save even back to
'90s... an archive of those (even though I'm uninterested in most rather
than a few relating to history of science, some types of art/graphics &
music) would presumably be too large except for data centres.
Giganews on Reddit published the number: 20 gigabits per second. Of new
data. This is approximately one new server full of hard drives every few
days. If your servers are some of those dedicated to holding as many
hard drives as possible, then one a week.
Ross Finlayson
2024-03-13 19:43:33 UTC
Permalink
Post by immibis
Post by David Chmelik
Hello. I'd like to start with saying thanks to Usenet administrators
and originators,
Usenet has a lot of perceived value as a cultural artifact, and also a
great experiment in free speech, association, and press.
Here I'm mostly interested in text Usenet,
not binaries, that text Usenet is a great artifact and experiment in
speech, association,
and press.
When I saw this example that may have a lot of old Usenet, then it sort
of aligned with an idea that started as an idea of vanity press, about
an archive of a group.
Now though, I wonder how to define an "archive any and all text usenet",
AAATU,
filesystem convention, as a sort of "Library Filesystem Format", LFF.
[...]
Sounds good; I'm interested in full archive of text newsgroups I use
(1300+) but don't know free Usenet servers even go back to when I started
(1996, though tried Internet in museum before Eternal September). I'm
aware I could use commercial ones that may, but don't know which nor cost/
space. Is Google Groups the only going back to 1981? I hope other
servers managed to save that before Google disconnected from peers or some
might turn up back to 1979.
Accessing some old binary ones would be nice also, but these days people
use commercial servers for those, which probably didn't save even back to
'90s... an archive of those (even though I'm uninterested in most rather
than a few relating to history of science, some types of art/graphics &
music) would presumably be too large except for data centres.
Giganews on Reddit published the number: 20 gigabits per second. Of new
data. This is approximately one new server full of hard drives every few
days. If your servers are some of those dedicated to holding as many
hard drives as possible, then one a week.
Right.... Once upon a time a major retail website made
a study, and 99% of the traffic was JPEG, and 50+% of
the CPU was compression and encryption.

These days usually encryption and compression
is a very significant load on web servers, which
are often designed also to simply consume huge
amounts of RAM.

It doesn't really have to be that way, in the case
that basically Internet Messages here Usenet
are "static assets" of a sort once arrived, if the
so very many of them and with regards to their
size, here that most text Usenet messages are
on the order of linear in 4KiB header + body,
while on the order of messages, each post.

So one way to look at the facilities, of the system,
is DB FS MQ WS, database filesystem message-queue
web-services, with regards to nodes, on hosts,
with regards to connection-oriented architectures,
message-passing systems, according to distributed
topologies, mostly point-to-point protocols.
Then nodes have CPU, RAM, and network and
storage I/O, these are the things, "space", and "time".

Our model of Usenet operation is "INN" or innd,
and the related tools and protocols and conventions,
for example cleanfeed, NoCem, Cancel or what was
Cancelmoose, or otherwise in terms of control
and junk bands, about site policies which include
rejection and retention, "INN" is in, the surrounds
is INN, there's an ecosystem of INN and derivative
projects and innovative projects, in the ecosystem.

NNTP, and IMAP, and POP3, and SMTP, have a very
high affinity, about protocols the exchange of
Internet Messages, text-based protocols, connection-
oriented protocols, with the layers of protocols,
DEFLATE compression and SASL authentication
and TLS encryption.

So, the idea of LFF, is basically that Usenet posts,
or Internet Messages, are each distinct and unique,
and cross between groups, and emails, yet mostly
within and among groups, and, with regards to
References, the threading of posts in threads.

So, the idea of LFF, is just that the FS filesystem,
is ubiquitous for hierarchical storage, and the
tools, are commonplace, are very well understood,
and the limits, of modern (meaning, since at least
20 years ago), filesystems, are sort of understood,
with respect to the identifiers of groups and posts,
in character sets and encodings, according to the
headers and bodies of the posts the messages,
at rest, according to a given group and date.

Then the idea seems to gather these, to forage
the posts, into a directory structure, then when
those are determined found, that the headers
may be have added some informative headers,
with regards to their archival as a sort of terminus
of the delivery, then to call that an archive for
the group+date and zip it up for posterity,
and put it in a hierarchical filesystem or object-store,
then for the declared purpose here of "archive
any and all text usenet", of course with respect
to the observance or honoring of such directives
as x no-archive and cancel or supersedes, or
otherwise what "is" or "isn't", what was.

So I'm really happy when I think about it,
Usenet, and stuff like INN and the ecosystem,
and the originators of these parts of the
ecosystem, and then administrators, and
the innovators, and with regards to the
_belles lettres_ of text Usenet, then only
sort of secondarily to the bells and whistles,
of the binary or larger objects that are not
for this, or that this is for "text LFF". (Binaries
basically as Internet Messages have quite
altogether variously structured contents
their bodies and references and external
references and body parts, not relevant here.)

So, especially if someone rather emeritus in
the originators, reads this, your opinion and
estimates, are highly valued, as with regards
and respect to what you want to see, with
regards to "AAAATU: belles lettres", and basically
for making it so that the protocols, of URL's and
URI's and URN's, about, Usenet posts,
even result Dublin Core, and DOI's, Message-IDs.

It's figured then if posts are just data,
and LFF is ubiquitous, then the ecosystem
can help develop the Museum Experience,
an archives, a search, tours, exhibits,
browsing, and the carrels, a living museum,
of Usenet, its authors, their posts, this culture.
CPMST
2024-03-14 04:00:03 UTC
Permalink
Post by Ross Finlayson
It doesn't really have to be that way, in the case
that basically Internet Messages here Usenet
are "static assets" of a sort once arrived, if the
so very many of them and with regards to their
size, here that most text Usenet messages are
on the order of linear in 4KiB header + body,
while on the order of messages, each post.
So one way to look at the facilities, of the system,
is DB FS MQ WS, database filesystem message-queue
<snip>
A monster essay of google translate word salad.
--
we have to go back
Blue-Maned_Hawk
2024-03-14 14:23:45 UTC
Permalink
I've seen much worse. This is parseäble.
--
Blue-Maned_Hawk│shortens to Hawk│/blu.mɛin.dʰak/│he/him/his/himself/Mr.
blue-maned_hawk.srht.site
Has anyone ever really been far even as decided to use even go so want to
look more like?
Ross Finlayson
2024-03-15 02:48:32 UTC
Permalink
Post by CPMST
Post by Ross Finlayson
It doesn't really have to be that way, in the case
that basically Internet Messages here Usenet
are "static assets" of a sort once arrived, if the
so very many of them and with regards to their
size, here that most text Usenet messages are
on the order of linear in 4KiB header + body,
while on the order of messages, each post.
So one way to look at the facilities, of the system,
is DB FS MQ WS, database filesystem message-queue
<snip>
A monster essay of google translate word salad.
Hallo, scusi if they are full and airy that
the machine translater overflows on them;
it's exactly this kind of idea: to design a
"Library/Lifetime filesystem-convention format"
or "LFF specification" for making inter-operable
resources.

For example, consider two main systems in the ecosystem,
INN and Cyrus, with regards to things like mbox format,
maildir filesystem-convention format, and these kinds
of things.


https://www.eyrie.org/~eagle/software/inn/docs-2.6/storage.conf.html

https://www.cyrusimap.org/imap/reference/admin/locations.html

https://en.wikipedia.org/wiki/Category:Archive_formats

https://en.wikipedia.org/wiki/Comparison_of_file_systems


The hope is to establish a specification that sufficiently
is big enough to hold each group's date's files,
while small enough to fit each disk's file's filesystem.


Somewhere in the middle is too big or too small,
here for a goal of hiring some space, collecting
any-and-all-text-usenet by each group+date+message,
and so that each message has a date, then compressing
those into zip files to save space, and making a catalog.




Humor bit:

"Word Soup: lots of words like world salad, but nutritious
and filling, because "word salad" means not sensical, and
"word soup" means too sensical. If you are new to Word Soup,
it's suggested to start with Alphabet Soup. If you encounter
a comma in your word soup, chew that thoroughly to proceed."

Words in English that stack: that that that that that.
If you find too many expletives, consider replacing them
with that.
Loading...