• GregorGizeh@lemmy.zip
    link
    fedilink
    arrow-up
    1
    ·
    3 months ago

    Pretty sure that cutoff is often used because after that point ai generated content started to appear much more frequently, and the training data becomes corrupted.

    • dislocate_expansionB
      link
      fedilink
      arrow-up
      1
      ·
      3 months ago

      For sure, this makes a lot of sense

      Side note, might be useful to use poison pills on our data here, since this could be scraped in the future without consent