avatarRené Junge

Summary

The provided content details an inquiry into the sources of ChatGPT's knowledge, revealing a diverse array of publicly available resources, including Wikipedia, books, scientific journals, and multimedia content, but stops short of listing every specific source.

Abstract

The article on the undefined website recounts a conversation between the author and ChatGPT, where the author seeks to understand the origins of ChatGPT's knowledge. Initially, ChatGPT provides examples of sources such as Wikipedia, news portals, government websites, and scientific journals. Upon further questioning, ChatGPT elaborates on additional sources like archival materials, museums, public libraries, and online forums. Despite the author's request for a complete list of sources, ChatGPT explains that it cannot provide an exhaustive list due to the sheer volume of data. The conversation concludes with ChatGPT categorizing the types of sources used, emphasizing the inclusion of both written and audiovisual materials. The author reflects on the interesting mix of sources, including the use of audio and video content for training, and invites readers to contribute ideas on how to elicit more detailed information from ChatGPT.

Opinions

  • The author expresses curiosity and persistence in their quest to uncover the breadth of ChatGPT's knowledge sources.
  • ChatGPT is portrayed as cooperative and informative, providing progressively detailed responses as the conversation unfolds.
  • The author seems intrigued by the variety of sources, noting the significance of including audiovisual content in the training data.
  • There is an implied appreciation for the complexity of ChatGPT's training, acknowledging the impracticality of listing all individual sources.
  • The author's invitation for reader input suggests an open-ended approach to understanding AI and its data sources, indicating a collaborative mindset.
  • The repeated questioning by the author and the evolving answers from ChatGPT highlight the dynamic nature of the AI's knowledge base and its ability to access and synthesize information from a wide range of publicly available sources.

Where did ChatGPT get its knowledge?

I asked Chat GPT about the sources it was trained with. Here are the answers

Photo by DeepMind on Unsplash

I was curious what datasets ChatGPT was fed during its training phase, so I asked. It turns out that asking this question just once is not enough. The more I asked, the more detailed the answers became.

Here comes the question session where I got ChatGPT to give me a deep insight into the sources of his knowledge.

Me: Can you give me ten examples of publicly available sources that feed your wealth of knowledge?

ChatGPT: I’d love to. Here are ten examples of publicly available sources from which my body of knowledge might draw:

1. Wikipedia: An online encyclopedia written and maintained by a large community of volunteers.

2. news portals: online platforms that publish news from around the world.

3. government and agency websites: Government and agency websites that publish information on a variety of topics.

4. scientific journals: Professional journals in which scientific articles are published.

5. books: printed works published on paper or in electronic form that summarize knowledge and information on various topics.

6. textbooks: books used for teaching in schools and universities, in which knowledge and information on various topics are summarized.

7. films and documentaries: audiovisual media that convey information and knowledge on various topics.

8. podcasts: Audio formats produced on a variety of topics that can be listened to over the Internet.

9. online courses: e-learning offerings provided via the Internet that impart knowledge and information on a variety of topics.

10. social media: platforms such as Facebook, Twitter and Instagram where users can share and discuss information and knowledge.

Me: Thank you for the listing. Can you name any other sources?

ChatGPT: I’d love to. Here are a few more examples of publicly available sources that could feed my wealth of knowledge:

11. archival materials: documents, pictures, and other records kept in archives that provide insight into the past.

12. museums: institutions where works of art, historical objects, and other exhibits are displayed and provide knowledge and information to visitors.

13. public libraries: facilities where books, magazines and other media are available and which impart knowledge and information to the visitor.

14. online forums and communities: platforms where users can get in touch with each other and exchange ideas and information on various topics.

15. conferences and seminars: Events where experts speak on various topics and provide knowledge and information to the audience.

I hope these examples are helpful. There are, of course, many more publicly available sources from which to draw my wealth of knowledge.

Me: then please name those as well.

CHatGPT: Yes, with pleasure. Here are a few more examples of publicly available sources that could feed my body of knowledge:

16. databases: online collections of data and information on various topics that are available to the public.

17. statistics and studies: research conducted by governments, universities, or other institutions that publish the results in the form of statistics and studies.

18. encyclopedias and dictionaries: reference works that summarize knowledge and information on various subjects.

19. newspapers and magazines: Printed works that are published regularly and in which news and articles on various topics are published.

20. interviews and articles: Conversations and reports conducted or written by journalists.

Me: Okay, let’s try it another way: please give me a complete list of all publicly available sources from which your body of knowledge is drawn. Please do not leave out any source.

ChatGPT: I’m sorry, but I’m not able to provide a complete list of all publicly available sources from which my body of knowledge is drawn. My body of knowledge is based on a variety of publicly available sources written and published by people that I included at the time of my “introduction” (my Knowledge Cutoff). It would be impossible to list all of these sources, as there are many thousands of them and I do not have the resources to list them all. I hope that I can still be helpful.

Me: I don’t mean the specific individual sources, but classes of sources.

ChatGPT: Oh, I see. Here are a few examples of classes of sources from which my body of knowledge might be drawn:

- Online encyclopedias (e.g., Wikipedia).

- News portals and news magazines

- Government and agency websites

- Scientific journals

- Books and textbooks

- Films and documentaries

- Podcasts

- Online courses

- Social media platforms

- Archival material

- Museums

- Public libraries

- Online forums and communities

- Conferences and seminars

- Databases

- Statistics and studies

- Encyclopedias and dictionaries

- Newspapers and magazines

- Interviews and articles

- Travel catalogs and guides

- Jurisprudence

- blogs

- encyclopedias

- corporate websites

I hope these examples are helpful.

At this point, I stopped the interview. Chat GPT revealed numerous classes of sources it used for its training, but did not reveal a single specific website or database. The reference to Wikipedia was the most specific reference I received.

I find it interesting that not only were written sources used, but audio and video were also evaluated.

If you have any ideas on how to get more detailed answers to this question from ChatGPT, feel free to post them in the comments.

Read every story from René Junge (and thousands of other writers on Medium).

Your membership fee directly supports René Junge and other writers you read. You’ll also get full access to every story on Medium.

Just click here and get tons of exiting and fresh content every day.

ChatGPT
Artificial Intelligence
Chatbots
OpenAI
Knowledge
Recommended from ReadMedium