• Danitos
    link
    fedilink
    English
    arrow-up
    19
    ·
    5 months ago

    Public data still have licenses. Eg, some open source licences force you to open source the software you created using them, something OpenAI doesn’t do.

    • Lemminary@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      5 months ago

      If you’re using it as you found it, then yeah. But if I take derived data from it like word count and word frequency, it’s not exactly the same thing and we call that statistics. Now if I draw associations of how often certain words appear together, and then compound that with millions of other sources to create a map of related words and concepts, I’m no longer using the data as you described because I’m doing something entirely different with it. What LLMs do is generates new information from its underlying sources.

      • Danitos
        link
        fedilink
        English
        arrow-up
        5
        ·
        5 months ago

        In my example, they would still be using the source code to create new software that is not open source, not matter how many Markov chains are behind it.

        • Lemminary@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          5 months ago

          That’s really stretching it, tbh. You’re arguing that the cake is made of chicken because it contained whole eggs at some point.