• Rhaedas@fedia.io
    link
    fedilink
    arrow-up
    39
    ·
    1 day ago

    My wife is a professional writer and uses em dash a lot, usually as --, including in her casual messages, as it’s common for her to use.

    It’s the formatting style of the whole thing that sounds AI to me. “Honestly” phrases really jumps out at me now, as well as the “But…” fragments. Not that they’re bad, hell, I type out things that way too. But for it to be all together, it sounds AI after you’ve seen it a lot.

    The em dash is fine here, emphasizing the final point. Although I would have probably used a comma myself for a post and not a formal manuscript.

    Funny thing is, you can get AI to reduce a lot of these tells with a decent system prompt and staging of the writing process. So I’m surprised we’re still seeing it a lot and it hasn’t been weaned out of the latest versions.

    • Pommes_für_dein_Balg@feddit.org
      link
      fedilink
      arrow-up
      16
      ·
      edit-2
      1 day ago

      It’s really hard to get rid of things caused by systematic bias in the training data.

      After inhaling the entire internet, LLMs started being trained on publically available books.
      And due to copyright, those were older ones from a time when em-dashes were used more.
      The training results were tested by humans, which needed to be cheap, but also English language natives.
      So they used workers in English-speaking African countries. Where the English taught in school is also more traditional with a focus on older literature, so the answers coming from the old literature were rated higher by the testers.

      • stormdelay@sh.itjust.works
        link
        fedilink
        arrow-up
        15
        arrow-down
        1
        ·
        1 day ago

        “Due to copyright” did they not all illegally download every book they could, copyrighted or not, to train their LLMs?

    • Sturgist@lemmy.ca
      link
      fedilink
      arrow-up
      4
      ·
      1 day ago

      Fair enough on the em-dash, hadn’t actually considered that LLMs use it extensively because it’s actually used in the wild.

    • Cris_Citrus@piefed.zip
      link
      fedilink
      English
      arrow-up
      4
      ·
      1 day ago

      The use of “–” is interesting, I use dashes to convey pacing constantly because I type as I speak, and so punctuation to me is largely about trying to write the delivery I want the reader to percieve, and I always just use “-” knowing it’s incorrect, but I don’t exactly wanna make myself seem even more like ai by switching lol

      I may try using “–”, thanks for sharing that!

      • Rhaedas@fedia.io
        link
        fedilink
        arrow-up
        5
        ·
        1 day ago

        I actually used to do the same thing with the -, and by em dash becoming a thing I dived into the usage and history of it all, including ; and en dash. And now I’m using - less. But I don’t use em dash more, just tend to throw a comma in.

        Another weird one I learned. em dash spacing. The spacing AI tends to use is not preferred by publishers now, but is more AP style, perhaps picking it up from when it was more popular to have space between the letters. Europe tends to prefer spacing but with a en dash (and I kind of like how that looks too, but it doesn’t fly if you publish in the US).

        • Cris_Citrus@piefed.zip
          link
          fedilink
          English
          arrow-up
          3
          ·
          1 day ago

          Wait, so what spacing is generally preferred for the em dash? Thats interesting, I never formally learned how to use one so I’m curious (I’ve not been to college, I have no idea if its part of typical higher education curriculum if you take any english courses)

          I abuse the fuck out of commas so I reach for dashes or ; when I want a longer pause that isn’t a logical end point for a thought. But a semicolon feels somehow a bit more formal to me, so I use it less for general online chatting