OpenAI has introduced a web crawling tool named 'GPTBot' with the aim of enhancing the functionality of future GPT models.
The company says that the data accumulated through GPTBot has the potential to improve the model's accuracy and extend its functionality, potentially representing an important step in the evolution of AI-powered language models. Masu.
Web crawlers (also known as web spiders) play an important role in indexing content on the vast Internet. Popular search engines such as Google and Bing utilize these bots to add relevant web pages to their search results.
OpenAI's GPTBot has a clear purpose. It's about collecting public data while carefully avoiding sources that include paywalls, personal data collection, or content that violates OpenAI's policies.
Website owners can prevent GPTBot from crawling their site by simply implementing the “disallow” command within the standard server files. This allows you to control which parts of your content are accessible to web crawlers.
OpenAI's announcement comes shortly after the company filed a trademark application for GPT-5, which is expected to be the successor to the current GPT-4 model.
The application, filed with the U.S. Patent and Trademark Office on July 18, includes the use of GPT-5 in AI-based human speech and text, speech-to-text conversion, speech recognition, and speech synthesis. ing.
But while the GPT-5 trademark filing sparked excitement among AI enthusiasts, OpenAI CEO Sam Altman cautioned against premature expectations. Altman revealed that the company is still a long way from starting GPT-5 training as it needs to conduct an extensive safety audit before embarking on the process.
OpenAI's recent efforts have been fraught with controversy. Concerns have been raised over the company's data collection practices, particularly around copyright and consent issues.
In June, Japan's privacy regulator issued a warning to OpenAI over unauthorized data collection. Earlier this year, Italy temporarily banned the use of ChatGPT for alleged violations of European Union privacy laws.
OpenAI and Microsoft are also currently facing a class action lawsuit filed by 16 plaintiffs alleging that personal information from ChatGPT user interactions was accessed without proper consent. The companies are also facing a lawsuit over GitHub Copilot, with plaintiffs claiming the code generation tool violates developer rights by scraping code without proper attribution.
If these allegations are found to be true, both OpenAI and Microsoft could be found to be in violation of the Computer Fraud and Abuse Act, which is a relevant case law in web scraping cases.
As OpenAI continues to push the boundaries of AI technology, we must overcome these challenges to ensure responsible and ethical development in the AI environment.
(Image credit: Gerd Altmann on Pixabay)
See also: Meta announces Llama 2 open source LLM
Want to learn more about AI and big data from industry leaders? Check out the AI & Big Data Expos in Amsterdam, California, and London. This event coincides with Digital Transformation Week.
Learn about other upcoming enterprise technology events and webinars from TechForge here.