Discussion:
Newsgroups files
Add Reply
Nigel Reed
2025-03-03 19:30:17 UTC
Reply
Permalink
Just a general moan about the state of the newsgroups files that I am
finding on my peers.


fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.


One sample group from 16 peers. the first thing, so many different
encodings. I've got ASCII, UTF-8, ISO-8859-1, WINDOWS-1252, even one
identifying as GB18030.

Next, 8 servers agree on one description, 3 on another, 2 more on yet
another, and finally 3 think the group is moderated.

How did things get in such a mixed up state?

What is even worse when trying to automate this, is when the majority
of servers have the wrong description or it's half and half.

I like the fact that some hierarchies have some published standard but
I expect much is long lost.

Ok, moan over.
--
End Of The Line BBS - Plano, TX
telnet endofthelinebbs.com 23
Ray Banana
2025-03-03 20:30:14 UTC
Reply
Permalink
Post by Nigel Reed
Just a general moan about the state of the newsgroups files that I am
finding on my peers.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
One sample group from 16 peers. the first thing, so many different
encodings. I've got ASCII, UTF-8, ISO-8859-1, WINDOWS-1252, even one
identifying as GB18030.
Next, 8 servers agree on one description, 3 on another, 2 more on yet
another, and finally 3 think the group is moderated.
How did things get in such a mixed up state?
On E-S (both reader and transit server):

fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.



I seem to remember that this started when the checkgroups messages for
fr.* were changed to UTF-8 (that doesn't account for the "Moderated"
flag, though). Julien Élie might she some light on this, as he is the
current issuer of control messages for fr.*.
--
Пу́тін — хуйло́
https://www.eternal-september.org
Ray Banana
2025-03-03 20:32:25 UTC
Reply
Permalink
Post by Nigel Reed
Just a general moan about the state of the newsgroups files that I am
finding on my peers.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
One sample group from 16 peers. the first thing, so many different
encodings. I've got ASCII, UTF-8, ISO-8859-1, WINDOWS-1252, even one
identifying as GB18030.
Next, 8 servers agree on one description, 3 on another, 2 more on yet
another, and finally 3 think the group is moderated.
How did things get in such a mixed up state?
On E-S (both reader and transit server):

fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.



I seem to remember that this started when the checkgroups messages for
fr.* were changed to UTF-8 (that doesn't account for the "Moderated"
flag, though). Julien Élie might she some light on this, as he is the
current issuer of control messages for fr.*.
--
Пу́тін — хуйло́
https://www.eternal-september.org
Nigel Reed
2025-03-03 20:36:34 UTC
Reply
Permalink
On Mon, 03 Mar 2025 21:32:25 +0100
Post by Nigel Reed
Post by Nigel Reed
Just a general moan about the state of the newsgroups files that I
am finding on my peers.
fr.bienvenue L'accueil des nouveaux venus dans leurs
premiers pas sur Usenet. fr.bienvenue L'accueil des
nouveaux venus dans leurs premiers pas sur Usenet. fr.bienvenue
Aider les nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs
premiers pas sur Usenet. fr.bienvenue Aider les nouveaux
venus dans leurs premiers pas sur Usenet. (Moderated) fr.bienvenue
L'accueil des nouveaux venus dans leurs premiers pas sur
Usenet. fr.bienvenue Aide aux nouveaux venus dans leurs
premiers pas sur Usenet. fr.bienvenue Aider les nouveaux
venus dans leurs premiers pas sur Usenet. (Moderated) fr.bienvenue
Aider les nouveaux venus dans leurs premiers pas sur
Usenet. (Moderated) fr.bienvenue Aider les nouveaux
venus dans leurs premiers pas sur Usenet. fr.bienvenue
Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers
pas sur Usenet. fr.bienvenue L'accueil des nouveaux
venus dans leurs premiers pas sur Usenet. fr.bienvenue
L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs
premiers pas sur Usenet. fr.bienvenue L'accueil des
nouveaux venus dans leurs premiers pas sur Usenet. One sample group
from 16 peers. the first thing, so many different encodings. I've
got ASCII, UTF-8, ISO-8859-1, WINDOWS-1252, even one identifying as
GB18030. Next, 8 servers agree on one description, 3 on another, 2
more on yet another, and finally 3 think the group is moderated.
How did things get in such a mixed up state?
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
I seem to remember that this started when the checkgroups messages for
fr.* were changed to UTF-8 (that doesn't account for the "Moderated"
flag, though). Julien Élie might she some light on this, as he is the
current issuer of control messages for fr.*.
I'm probably just going to get a script to pull the most popular of the
descriptions for the list and ignore the moderated part unless the
group has moderated in its name or a majority think its moderated when
do a manual check on those.
--
End Of The Line BBS - Plano, TX
telnet endofthelinebbs.com 23
DV
2025-03-03 21:34:06 UTC
Reply
Permalink
Post by Nigel Reed
I'm probably just going to get a script to pull the most popular of the
descriptions for the list and ignore the moderated part unless the
group has moderated in its name or a majority think its moderated when
do a manual check on those.
On Julien Élie's website, the following changes can be seen for
fr.bienvenue:

2011-12-19 23:30:02 changegroup fr.bienvenue from m to y
2020-12-25 21:50:02 changedesc fr.bienvenue
2023-10-28 18:20:02 changedesc fr.bienvenue

The group is currently not moderated, and its description is as follows:

L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.

The only moderate groups in the fr.* hierarchy are:

fr.misc.bavardages.dinosaures
fr.usenet.abus.rapports
fr.usenet.forums.annonces
fr.usenet.stats

Source: <http://usenet.trigofacile.com/hierarchies/fr.html>

I keep an updated list of the fr.* groups, with their status and
description:

<http://usenet-fr.yakakwatik.org/groupes.html>
--
Denis

USENET FRANCOPHONE (serveurs, passerelles, groupes, lecteurs de news, docs) :
<http://usenet-fr.yakakwatik.org>
Julien ÉLIE
2025-03-03 22:00:19 UTC
Reply
Permalink
Salut Denis,
Post by DV
I keep an updated list of the fr.* groups, with their status and
<http://usenet-fr.yakakwatik.org/groupes.html>
Thanks for this great list, with links to the charters! Very useful to
have.
As well as a list of news servers and webnews which provide an access to
fr.* :)
--
Julien ÉLIE

« Audentes fortunat iuvat. » (Virgile)
Nigel Reed
2025-03-03 23:07:41 UTC
Reply
Permalink
On Mon, 3 Mar 2025 21:34:06 -0000 (UTC)
Post by DV
Post by Nigel Reed
I'm probably just going to get a script to pull the most popular of
the descriptions for the list and ignore the moderated part unless
the group has moderated in its name or a majority think its
moderated when do a manual check on those.
On Julien Élie's website, the following changes can be seen for
2011-12-19 23:30:02 changegroup fr.bienvenue from m to y
2020-12-25 21:50:02 changedesc fr.bienvenue
2023-10-28 18:20:02 changedesc fr.bienvenue
L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.misc.bavardages.dinosaures
fr.usenet.abus.rapports
fr.usenet.forums.annonces
fr.usenet.stats
Source: <http://usenet.trigofacile.com/hierarchies/fr.html>
I keep an updated list of the fr.* groups, with their status and
<http://usenet-fr.yakakwatik.org/groupes.html>
Thanks but I'm not just talking about fr. I'm talking about every group
that is carried by all my peers.
--
End Of The Line BBS - Plano, TX
telnet endofthelinebbs.com 23
Julien ÉLIE
2025-03-03 21:40:57 UTC
Reply
Permalink
Hi Nigel,
Post by Nigel Reed
I'm probably just going to get a script to pull the most popular of the
descriptions for the list and ignore the moderated part unless the
group has moderated in its name or a majority think its moderated when
do a manual check on those.
I would suggest to instead just use the latest known descriptions (from
checkgroups when they are sent).
I maintain the list encoded in UTF-8 (the standard according to RFCs) here:

https://raw.githubusercontent.com/Julien-Elie/usenet-hierarchies/refs/heads/main/website/data/newsgroups.utf8

Also, FWIW, the same list in pure ASCII:

https://raw.githubusercontent.com/Julien-Elie/usenet-hierarchies/refs/heads/main/website/data/newsgroups.ascii


The usual master file for these descriptions has unfortunately mixed
charsets (like windows-1252 for some descriptions, UTF-8 for others,
ISO-8859-xx variants, etc.):
https://ftp.isc.org/pub/usenet/CONFIG/newsgroups

That's why I generate the above first two lists :)
Feel free to use!
--
Julien ÉLIE

« Ce qui est fait n'est plus à faire. »
Nigel Reed
2025-03-03 23:13:34 UTC
Reply
Permalink
On Mon, 3 Mar 2025 22:40:57 +0100
Post by Julien ÉLIE
Hi Nigel,
Post by Nigel Reed
I'm probably just going to get a script to pull the most popular of
the descriptions for the list and ignore the moderated part unless
the group has moderated in its name or a majority think its
moderated when do a manual check on those.
I would suggest to instead just use the latest known descriptions
(from checkgroups when they are sent).
https://raw.githubusercontent.com/Julien-Elie/usenet-hierarchies/refs/heads/main/website/data/newsgroups.utf8
https://raw.githubusercontent.com/Julien-Elie/usenet-hierarchies/refs/heads/main/website/data/newsgroups.ascii
The usual master file for these descriptions has unfortunately mixed
charsets (like windows-1252 for some descriptions, UTF-8 for others,
https://ftp.isc.org/pub/usenet/CONFIG/newsgroups
That's why I generate the above first two lists :)
Feel free to use!
Yes, we've sort of had this discussion before about encoding. This one
is more about the inconsistency of the labeling of the groups.

In the newsgroups list above, pretty much every group that contains
non-standard A-Z letters is garbled.

Probably because it's ISO-8859 when I'm using UTF-8. The cn.* groups
are definitely garbled.

I'll just do my best to make a valid UTF-8 file for my server.
--
End Of The Line BBS - Plano, TX
telnet endofthelinebbs.com 23
Julien ÉLIE
2025-03-04 08:58:10 UTC
Reply
Permalink
Hi Nigel,
Yes, we've sort of had this discussion before about encoding. This one
is more about the inconsistency of the labeling of the groups.
The inconsistency of the labeling can come from several causes. One of
them is the encoding, depending on how the news server interprets the
control articles (if it follows the declared encoding or, in the lack of
declared encoding in the control message, which local encoding it will use).
Another cause can be that the news server no longer process control
articles (change of PGP key not updated, problem in GnuPG or like) so it
will not see possible description changes in checkgroups. It probably
explains some variants you saw on fr.bienvenue.
Or the news server does not have the feature to automatically update the
descriptions, or it has been disabled by the newsmaster. Not all news
servers do that; as far as INN is concerned, I added that feature in its
2.4.6 version in 2009; I reworked the docheckgroups program at that
time, with a new -u flag that does the right magic of updating the
descriptions in the newsgroups file with a proper number of tabulations
and an alphabetical sort, removing obsolete descriptions and adding new
ones. Descriptions from newgroup and checkgroups control articles have
been properly reflected since then. Nonetheless, not all news software
does that.

It would be interesting to know whether fr.bienvenue is still declared
moderated in the active file of the news server which have "(Moderated)"
at the end of its description. It may just happen that they processed
the newgroup control article once sent to unmoderate it, but dit not
update the description.
In the newsgroups list above, pretty much every group that contains
non-standard A-Z letters is garbled.
Probably because it's ISO-8859 when I'm using UTF-8. The cn.* groups
are definitely garbled.
I'll just do my best to make a valid UTF-8 file for my server.
In fact, the newsgroup list from GitHub was properly encoded in UTF-8
but your navigator did not use UTF-8 to render it for a reason I do not
know. Might you have to force the charset in your navigator?
The HTTP headers correctly have:
Content-Type: text/plain; charset=utf-8

Does it appear better with this version?
http://usenet.trigofacile.com/hierarchies/data/newsgroups.utf8

Or maybe you donwloaded the file and then opened it with an editor in
another charset?

% file newsgroups.utf8
newsgroups.utf8: UTF-8 Unicode text
--
Julien ÉLIE

« Love is blind but marriage is an eye-opener. »
Nigel Reed
2025-03-04 17:35:35 UTC
Reply
Permalink
On Tue, 4 Mar 2025 09:58:10 +0100
Post by Julien ÉLIE
It would be interesting to know whether fr.bienvenue is still
declared moderated in the active file of the news server which have
"(Moderated)" at the end of its description. It may just happen that
they processed the newgroup control article once sent to unmoderate
it, but dit not update the description.
2 out of the 3 still have it as moderated.
Post by Julien ÉLIE
In fact, the newsgroup list from GitHub was properly encoded in UTF-8
but your navigator did not use UTF-8 to render it for a reason I do
not know. Might you have to force the charset in your navigator?
Content-Type: text/plain; charset=utf-8
Does it appear better with this version?
http://usenet.trigofacile.com/hierarchies/data/newsgroups.utf8
Yes, this one was better, thanks.
--
End Of The Line BBS - Plano, TX
telnet endofthelinebbs.com 23
Julien ÉLIE
2025-03-04 18:45:11 UTC
Reply
Permalink
Hi Nigel,
Post by Nigel Reed
Post by Julien ÉLIE
It would be interesting to know whether fr.bienvenue is still
declared moderated in the active file of the news server which have
"(Moderated)" at the end of its description. It may just happen that
they processed the newgroup control article once sent to unmoderate
it, but dit not update the description.
2 out of the 3 still have it as moderated.
Do you happen to know whether they honour control articles?
Do they manage their newsgroup list by hand?

If you know how to contact these 2 news admins who still have
fr.bienvenue marked as moderated, could you ask them?

I bet this is not the only discrepancy in their servers... Did they
reflect the latest changes in the Big-8?

I could try to send a "booster" for the unmoderation of fr.bienvenue
(dating back to 2011!) but I doubt they have the current PGP key of fr.*
(which changed in 2020 as the previous one, unused during several years,
was lost).
--
Julien ÉLIE

« Videt non te diu. »
Nigel Reed
2025-03-04 23:02:37 UTC
Reply
Permalink
On Tue, 4 Mar 2025 19:45:11 +0100
Post by Julien ÉLIE
Post by Nigel Reed
2 out of the 3 still have it as moderated.
Do you happen to know whether they honour control articles?
Do they manage their newsgroup list by hand?
No idea.
Post by Julien ÉLIE
If you know how to contact these 2 news admins who still have
fr.bienvenue marked as moderated, could you ask them?
I could but I don't care enough :) I've got too much other stuff to do
without administering other people's usenet servers.
Post by Julien ÉLIE
I bet this is not the only discrepancy in their servers... Did they
reflect the latest changes in the Big-8?
I expect there are many discrepancies. I couldn't tell you if they have
the Big-8 changes or not. I probably don't myself because I don't run
checkgroups or whatever.
Post by Julien ÉLIE
I could try to send a "booster" for the unmoderation of fr.bienvenue
(dating back to 2011!) but I doubt they have the current PGP key of
fr.* (which changed in 2020 as the previous one, unused during
several years, was lost).
I wouldn't bother. Like you say, there's probably more than just that
one group that's out of sync. Given that I have a number of servers
that have the correct name and moderated status for that group, I would
probably pick their reliability over the others.
--
End Of The Line BBS - Plano, TX
telnet endofthelinebbs.com 23
Ray Banana
2025-03-03 20:41:54 UTC
Reply
Permalink
Post by Nigel Reed
Just a general moan about the state of the newsgroups files that I am
finding on my peers.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)
fr.bienvenue Aider les nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue Aide aux nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
One sample group from 16 peers. the first thing, so many different
encodings. I've got ASCII, UTF-8, ISO-8859-1, WINDOWS-1252, even one
identifying as GB18030.
Next, 8 servers agree on one description, 3 on another, 2 more on yet
another, and finally 3 think the group is moderated.
How did things get in such a mixed up state?
On E-S (both reader and transit server):

fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.



I seem to remember that this started when the checkgroups messages for
fr.* were changed to UTF-8 (that doesn't account for the "Moderated"
flag, though). Julien Élie might shed some light on this, as he is the
current issuer of control messages for fr.*.

PS: Please refer to

http://usenet.trigofacile.com/hierarchies/fr.html

for the change from "Moderated" to unmoderated and the current content
of the official checkgroups file.

HTH
--
Пу́тін — хуйло́
https://www.eternal-september.org
Julien ÉLIE
2025-03-03 21:40:53 UTC
Reply
Permalink
Hi Wolfgang,
Post by Nigel Reed
fr.bienvenue L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.
This is indeed the right one :)
Post by Nigel Reed
I seem to remember that this started when the checkgroups messages for
fr.* were changed to UTF-8 (that doesn't account for the "Moderated"
flag, though). Julien Élie might shed some light on this, as he is the
current issuer of control messages for fr.*.
There was a change of encoding in December 2011 when I convinced the
former sender of control articles, Ollivier Robert, to switch to UTF-8.
Control articles were previously sent using ISO-8859-15 (differing from
ISO-8859-1 with a couple of special French characters like "œ", as in
œuf, an egg in French, or œil, an eye).

And looking back at my archives, I see that there was no encoding
declared at all before January 2009 in fr.* checkgroups.

It doesn't make me any younger, but it corresponds to when I was way
more active in INN development and greatly improved, among other things,
the handling of control articles, implementing most of the new RFCs.
According to the changelog:

Changes in 2.5.0 (2009-05-21)

controlchan recognizes the new application/news-groupinfo entity
described in USEPRO and can handle character set conversions of
newsgroup descriptions. The MIME::Parser and Encode modules are
used. Processing control messages has been greatly improved,
especially checkgroups: the active and newsgroups files are now
properly updated when they are processed, and all matching lines in
control.ctl for a given checkgroups are honoured (which for instance
allows using both drop and doit actions for the same checkgroups
message).

A new control.ctl.local file has also been added in pathetc. Rules
set in that file override rules in control.ctl, allowing
administrators to specify local rules for some control messages
without modifying the control.ctl configuration file that comes with
INN. It also specifies encodings to use for the newsgroups file.
By default, UTF-8 will be used for newsgroup descriptions, as
strongly recommended by RFC 3977.
Post by Nigel Reed
PS: Please refer to
http://usenet.trigofacile.com/hierarchies/fr.html
for the change from "Moderated" to unmoderated and the current content
of the official checkgroups file.
The changes mentioned in the web page explain the 3 different
descriptions found by Nigel:

2011-12-19 23:30:02 changegroup fr.bienvenue from m to y (by control
article)
2020-12-25 21:50:02 changedesc fr.bienvenue (by checkgroups)
2023-10-28 18:20:02 changedesc fr.bienvenue (by checkgroups)

Before 2011-12-19, when it was moderated:
Aider les nouveaux venus dans leurs premiers pas sur Usenet. (Moderated)

Then from 2011-12-19 to 2020-12-25:
Aider les nouveaux venus dans leurs premiers pas sur Usenet.

Then from 2020-12-25 to 2023-10-28:
Aide aux nouveaux venus dans leurs premiers pas sur Usenet.

Finally, since 2023-10-28:
L'accueil des nouveaux venus dans leurs premiers pas sur Usenet.



This description only has ASCII characters, thus was not affected by the
change from ISO-8859-15 encoding to UTF-8. There were just wording
changes. The last one in 2023 was to globally homogenize the style of
the descriptions in the whole checkgroups (notably with an article:
le/la/l').
--
Julien ÉLIE

« Felix qui potuit rerum cognoscere causas. » (Virgile)
Julien ÉLIE
2025-03-03 21:55:15 UTC
Reply
Permalink
Hi Nigel,
Post by Nigel Reed
One sample group from 16 peers. the first thing, so many different
encodings. I've got ASCII, UTF-8, ISO-8859-1, WINDOWS-1252, even one
identifying as GB18030.
Next, 8 servers agree on one description, 3 on another, 2 more on yet
another, and finally 3 think the group is moderated.
How did things get in such a mixed up state?
Because there originally wasn't any standard for the encoding of control
articles. Most of them did not declare anything (the usual encoding
locally used by the sender was assumed - like gb18030 for cn.*, koi8-u
for ukr.* [my sympathy to them!], big5 for tw.*, iso-8859-15 for fr.*,
cp1252 for most of the others, etc.).
Only "recently" a new version of the standard recommended the use of UTF-8.

That why you end up seeing mixed and incoherent encodings in existing
news servers. Not all of them run a version which implements the new
interoperable state of art (UTF-8) to parse control articles. And if
the descriptions pre-date the receival of new control articles, not all
the news administrators have manually homogenized the descriptions to
UTF-8. (No blame in my sentence, just a fact.)
Post by Nigel Reed
What is even worse when trying to automate this, is when the majority
of servers have the wrong description or it's half and half.
Just use
https://raw.githubusercontent.com/Julien-Elie/usenet-hierarchies/refs/heads/main/website/data/newsgroups.utf8
:)
--
Julien ÉLIE

« Celui qui sait qu'il ne sait pas, éduque-le.
Celui qui sait qu'il sait, écoute-le.
Celui qui ne sait pas qu'il sait, éveille-le.
Celui qui ne sait pas qu'il ne sait pas, fuis-le. » (proverbe chinois)
Nigel Reed
2025-03-03 23:33:59 UTC
Reply
Permalink
On Mon, 3 Mar 2025 22:55:15 +0100
Post by Julien ÉLIE
Hi Nigel,
Post by Nigel Reed
One sample group from 16 peers. the first thing, so many different
encodings. I've got ASCII, UTF-8, ISO-8859-1, WINDOWS-1252, even one
identifying as GB18030.
Next, 8 servers agree on one description, 3 on another, 2 more on
yet another, and finally 3 think the group is moderated.
How did things get in such a mixed up state?
Because there originally wasn't any standard for the encoding of
control articles. Most of them did not declare anything (the usual
encoding locally used by the sender was assumed - like gb18030 for
cn.*, koi8-u for ukr.* [my sympathy to them!], big5 for tw.*,
iso-8859-15 for fr.*, cp1252 for most of the others, etc.).
Only "recently" a new version of the standard recommended the use of UTF-8.
That why you end up seeing mixed and incoherent encodings in existing
news servers. Not all of them run a version which implements the new
interoperable state of art (UTF-8) to parse control articles. And if
the descriptions pre-date the receival of new control articles, not
all the news administrators have manually homogenized the
descriptions to UTF-8. (No blame in my sentence, just a fact.)
Post by Nigel Reed
What is even worse when trying to automate this, is when the
majority of servers have the wrong description or it's half and
half.
Just use
https://raw.githubusercontent.com/Julien-Elie/usenet-hierarchies/refs/heads/main/website/data/newsgroups.utf8
:)
That's a good start but I still have 36,519 groups in my active file
that aren't in your list.
--
End Of The Line BBS - Plano, TX
telnet endofthelinebbs.com 23
Julien ÉLIE
2025-03-04 08:58:08 UTC
Reply
Permalink
Hi Nigel,
Post by Nigel Reed
That's a good start but I still have 36,519 groups in my active file
that aren't in your list.
Not all newsgroups have a description. Amongst these 36,519 groups, do
you already have a description in your own news server, or do you see a
valid description in another server?
--
Julien ÉLIE

« – Prends un peu de potion magique, Jolitorax ?
– Mais ça va être l'heure de l'eau chaude ! » (Astérix)
Nigel Reed
2025-03-04 17:39:14 UTC
Reply
Permalink
On Tue, 4 Mar 2025 09:58:08 +0100
Post by Julien ÉLIE
Hi Nigel,
Post by Nigel Reed
That's a good start but I still have 36,519 groups in my active file
that aren't in your list.
Not all newsgroups have a description. Amongst these 36,519 groups,
do you already have a description in your own news server, or do you
see a valid description in another server?
I get 10,376 groups with "No description" but that doesn't account for
whether it has a description on another server. I've seen that happen
before.

No worries, I'll figure it out.
--
End Of The Line BBS - Plano, TX
telnet endofthelinebbs.com 23
Loading...