このプラグインは2年以上更新されていません。もうメンテナンスやサポートがされていないかもしれず、最新バージョンの WordPress で使用した場合は互換性の問題が発生する可能性があります。

WP Web Scraper

説明

An easy to implement web data extractor for WordPress. This plugin can be used to display realtime data from any websites directly into your posts, pages or sidebar. It temporarily caches the content on your website. You can use this plugin to include realtime stock quotes, cricket or soccer scores or any other generic content from public domains.

Important Note:
Web scraping is a practice of extracting data from another website. When doing so, you need to ensure that you have the permissions to use the content, and you need to give due credit to the original source.
Make sure when scraping content, you do not violate any copyright laws.

Features include:

  1. Scraped output can be displayed through custom template tag, shortcode in page, post and sidebar (through a text widget).
  2. Configurable caching of scraped data. Cache timeout can be defined in minutes for every scraped data.
  3. Configurable Useragent for your scraper can be set for every scrape.
  4. Configurable default settings like enabling, useragent, timeout, caching, error handling.
  5. Multiple ways to query content – CSS Selector, XPath or Regex.
  6. A wide range of arguments for parsing content.
  7. Option to pass post arguments to a URL to be scraped.
  8. Dynamic conversion of scraped content to specified character encoding to scrape data from a site using different charset.
  9. Create scraped pages on the fly using dynamic generation of URLs to scrape or post arguments based on your page’s get or post arguments.
  10. Callback function for advanced parsing of scraped data.

Check the the official website wp-ws.net for documentation, browse through examples, or try paid support for crafting a perfectly optimized web scraper.

Documentation

Examples

Example code for some common use cases of the plugin

インストール

  1. Upload folder wp-web-scrapper to the /wp-content/plugins/ directory
  2. Activate the plugin through the ‘Plugins’ menu in WordPress
  3. Usage instructions for WP Web Scraper

Mode details on this on the FAQs page

評価

Only works sometimes

Only works on some pages. On other pages it gives an error no matter what you try to select. The author abandoned this plugin a couple years ago.

It’s Works fine until

The plugin works great for me until some things like this happen

Error Parsing: Query returned empty response

Versatile and Powerful

Great plugin. It’s a little tricky at first to get going with the arguments and API but you can precisely pick anything off other webpages with this thing!

18件のレビューをすべて表示

貢献者と開発者

WP Web Scraper はオープンソースソフトウェアです。以下の人々がこのプラグインに貢献しています。

貢献者

“WP Web Scraper” をあなたの言語に翻訳しましょう。

開発に興味がありますか ?

コードを閲覧するか、SVN リポジトリをチェックするか、開発ログRSS で購読してみてください。

変更履歴

3.5

  • Bug fix: Post request
  • Bug fix: gt, lt arguments

3.4

  • Added scrape importer
  • replace_query and replace_with now accepted specially formatted array arguments

3.3

  • Basehref bug fix

3.2

  • Documentation website change

3.1

  • Bug fix: Minor bug fixes.

3.0

  • Enhancement: Complete code rewrite, uses PHP DOM directly for faster processing
  • Enhancement: Sandbox to test and debug
  • Deprecation: Dropped removetags
  • Changes: Changes in arguments

2.8

  • Enhancement: Migrated caching to the Transients API.
  • Enhancement: Clear and find / replace now supports selectors.
  • Enhancement: Cleaner code – faster processing.
  • Enhancement: More debugging data including processing time.
  • Deprecation: Modules are deprecated in support of callback functions.

2.7

  • Enhancement: Added callback for flexible as well as advanced parsing.
  • Bug fix: Fixed the issue of usage within widget.

2.6

  • Enhancement: Added removetags to remove certain tags and content from scrape.
  • Bug fix: Retains http-cache and modules on upgrade.

2.5

  • Bug fix: Patched a major security issue related to useragent string settings.

2.4

  • Bug fix: Added xpathdecode to handle complex xpath queries in shortcode.

2.3

  • Enhancement: Added support for xpaths.
  • Enhancement: Uses builtin WP_HTTP classes instead of raw cURL or Fopen.
  • Enhancement: Complete overhaul of code, architecture and documentation.
  • Enhancement: Reversed to filebased cache instead of MySQL tables.

2.2

  • Enhancement: Introduction of special variable ___QUERY_STRING___ for dynamic URLs.
  • Enhancement: Upgraded the underlying phpQuery library to single file version.

2.1

  • Enhancement: Option to turn off the debug information displayed as html comment.

2.0

  • Milestone release: Complete overhaul of code, architecture and documentation.
  • Bug fix: Multiple bug fixes addressed.