程序接口

Using the API in Visual Studio


Visual Web Ripper contains two main API files. You need to include one or both of these files in your Visual Studio project:

  • WebRipper.DLL
  • WebRipperBrowser.DLL
    If you want to use the API to run a project, you need to include both WebRipperBrowser.DLL and WebRipper.DLL in your Visual Studio project. If you only want to process extracted data, you need to include only WebRipper.DLL.

Include the API files for Visual Web Ripper by adding the files as references. Browse the Visual Web Ripper installation folder in order to find the API files.


image.png

You must also copy the following four files to your applications Bin folder.

  • SQLite.Interop.dll
  • msvcp100.dll
  • msvcr100.dll
  • AjaxHook.dll

After including the API files in your project, you will have access to the namespaces VisualWebRipper and VisualWebRipper.Processor. If you have included only WebRipper.DLL, you will have access only to the namespace VisualWebRipper. This is how to include a namespace in C#:

using  VisualWebRipper;   
using  VisualWebRipper.Processor;  

Platform Target

Visual Web Ripper is a 32-bit application. A 32-bit application can run on a 64-bit operating system, but it must run in 32-bit mode, so you must set the target platform to x86 as shown below.

image.png

Loading and Running a Project


The most common task when using the API is loading and running a data extraction project from within your own application.

The following classes are used when running a project:

  • WrProject defines an instance of a data extraction project.
  • WrAgent can be used to run a project with the WebCrawler agent or the WebBrowser agent.

The following two static methods can be used to load a data extraction project.

WrProject project = WrProject.Load( "C:\projects\sequentum.rip" );                       
WrProject project = WrProject.LoadByName( "sequentum" );

The following three static methods can be used to run a data extraction project in synchronous mode.

IAgent agent = WrAgent.RunProject( new  WrProcessPars(project), true);     
IAgent agent = WrAgent.RunProject(project, true);    
IAgent agent = WrAgent.RunProject( "C:\projects\sequentum.rip", true);

You can control certain aspects of the process by specifying additional parameters on WrProcessPars.

WrProcessPars(WrProject project, bool isResume, bool isRetryErrors, 
    bool isViewBrowser, WrProcessorTypeEnum defaultAgentType, int debugLevel)

For example:

WrAgent.RunProject(new WrProcessPars(project, false, false, true, 
    project.DefaultCollector, project.LogLevel), true);

Status information can be retrieved from the IAgent interface as follows.

string status = agent.Status;     
int  processedPages = agent.ProcessedPages;     
int  pageLoadErrors = agent.TimeoutPages;     
int  missedRequiredElements = agent.MissedRequiredElements;     
bool  isError = agent.IsError

The following three static methods can be used to run a data extraction project in asynchronous mode.

IAgent agent = WrAgent.RunProject( new  WrProcessPars(project), false);     
IAgent agent = WrAgent.RunProject(project, false);     
IAgent agent = WrAgent.RunProject( "C:\projects\sequentum.rip", false);

If you are running a project asynchronously you can use the IsDone property of the IAgent interface to see whether a project has finished running.

if (agent.IsDone)   
{   
     //The project has finished running   
}

Manipulating a Project


You can use the API to manipulate a data extraction project before you run it. You must first load the project to get an instance of theWrProjectclass.

The following two static methods can be used to load a data extraction project.

WrProject project = WrProject.Load( "C:\projects\sequentum.rip" );                       
WrProject project = WrProject.LoadByName( "sequentum" );

After you have loaded the project, you can set any of its properties and then run the project.

WrProject project = WrProject.LoadByName( "sequentum" );   
project.StartUrls.Clear();   
project.StartUrls.Add( "http://www.sequentum.com" );   
IAgent agent = WrAgent.RunProject(project);

Setting Input Parameters

The best way to manipulate a project is to use input parameters. This allows you to keep all functionality in the project file and use the API to set the parameters.

WrProject project = WrProject.LoadByName( "sequentum" );   
project.InputParameters.SetParameter( "server" ,  "web" );   
project.InputParameters.SetParameter( "database" ,  "test" );   
project.InputParameters.SetParameter( "username" ,  "myUser" );   
project.InputParameters.SetParameter( "password" ,  "myPassword" );   
IAgent agent = WrAgent.RunProject(project);

You do not need to use the API in order to supply input parameters to a project. You can also use the command-line tool to run projects and specify input parameters.

  • Command-Line Utility

Setting the Output Folder

When exporting data to a file format, such as CSV or XML, the output folder can be set this way.

WrProject project = WrProject.LoadByName( "sequentum" );   
project.DataConfiguration.DataSource.OutputFolder = @"c:\output";
project.DataConfiguration.DataSource.IsDefaultOutputFolder = false;

You can export data programatically to the export target configured in a project.

WrProject project = WrProject.LoadByName( "sequentum" );              
WrExportData data = project.OpenExportData();
WrExport.Export(project, data)

Working With Export Data


After you have run a data extraction project, you may want to do some custom post-processing on the extracted data. You can configure a custom export script for the project, but if you are using the API to run a project, it may be easier and more appropriate to post-process the extracted data directly in your application using the API to access the extracted data.

The class WrExportData provides access to the exported data. You can get an instance of the WrExportData class by calling the method OpenExportData of the WrProject class.

WrProject project = WrProject.LoadByName( "sequentum" );              
WrExportData data = project.OpenExportData();
Example

This example runs a project and then writes the extracted data to a text file.

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using VisualWebRipper;

namespace Export
{
    class Program
    {
        static void Main(string[] args)
        {
            WrProject project = WrProject.LoadByName("sequentum");
            project.ViewBrowserCollector = false;
            IAgent agent = WrAgent.RunProject(project, true);
            WrExportData data = project.OpenExportedData();

            StringBuilder content = new StringBuilder();
            WrExportTableReader reader = data.GetTableReader("table_name");
            while (reader.Read())
            {
                content.Append(reader.GetStringValue("productId"));
                content.Append(",");
                content.Append(reader.GetStringValue("productName"));
                content.Append(",");
                content.Append(reader.GetStringValue("price"));
                content.Append(Environment.NewLine);
            }
            File.WriteAllText("C:\\div\\output.txt", content.ToString());
            reader.Close();
            data.Close();
        }
    }
}

Using the API from ASP.NET


You should never use the API directly from a web application, but instead build a command-line program that uses the API, and then call that program from your ASP.NET web application. When you start the command-line program, you can specify the user context in which the program should run.

A web application is likely to have insufficient privileges to run a website in IE. The required privileges may depend on the target website, so it's nearly impossible to configure a web server with the correct privileges. Notice that a web application is likely to have different privileges when run from within Visual Studio compared to when it is deployed to a web server.

Project Owner Settings


Visual Web Ripper uses your Windows user settings to retrieve information about the default location of your Visual Web Ripper files.

When a project runs from your application, it may run in the context of a user that does not have any Visual Web Ripper settings. A data extraction project contains information about the user who owns the project, and Visual Web Ripper will use that information to locate the default Visual Web Ripper folders.

You can set the project owner in the Project menu in Visual Web Ripper.


image.png

If you copy a project from one computer to another, your application may be unable to run the project on the new computer until you set the project owner to a Windows user on the new computer.

If you are using the command-line utility to run a data extraction project, the project owner information will automatically be used to locate the appropriate licensing information and default folders, but if you are using the API in a custom application, you must call the method VisualWebRipperPath.SetServiceDocumentPath as in this example.

WrProject project = WrProject.LoadByName( "Sequentum" );   
  
VisualWebRipperPath.SetServiceDocumentPath(project.Schedule.DocumentPath);

The project owner settings do not specify the Windows security context when running a data extraction project. The project will run in the security context of the user who started your program. You must make sure the user that runs your program has access to all the required resources on your computer. For example, if your data extraction project is using the WebBrowser agent, the user must be able to start an instance of Internet Explorer.

Internet Explorer Emulation Mode


Visual Web Ripper uses an embedded instance of Internet Explorer when running a project using the WebBrowser agent.

The embedded IE instance runs in IE7 emulation mode by default, but Visual Web Ripper is configured to use IE9 emulation mode. If you have developed a project in Visual Web Ripper it may not work correctly in your own application because the website is displayed differently in IE9 and IE7.

The IE emulation mode is set in the registry for each executable, so if you want your application to run in IE9 emulation mode, you need to change your registry setting.

Please read this blog post for more information about changing your registry to specify IE9 emulation mode.

http://www.west-wind.com/weblog/posts/2011/May/21/Web-Browser-Control-Specifying-the-IE-Version

Export Plugins


Export plugins can be used to provide customized export functionality. Export plugins are similar to export scripts, but plugins can provide a user interface allowing a user to configure the export settings at design time.

The following screenshot shows a plugin user interface that allows a user to specify a database connection string at design time. The database connection string can then be used by the plugin at runtime when exporting data.

image.png

The plugin export routine is run after the standard data export, so a plugin can be used to operate on exported data files. For example, a plugin export routine could use FTP to transfer an exported CSV file to a remote server, and the plugin user interface could be used to configure the FTP address and login details.

You can set the standard export target to None if you want a plugin export routine to completely replace the standard data export.

Export plugins should be placed in the Visual Web Ripper installation folder in the sub-folder Plugins\Export. Each plugin should be placed in a separate folder and the name of the folder becomes the plugin name displayed to the Visual Web Ripper user. A plugin named FTPExport should be placed in the following sub-folder.

Plugins\Export\FTPExport

Building a Plugin

Visual Web Ripper uses the .NET MEF plugin framework. All plugins must export an implementation of the interface IExportPlugin, which is declared in the WebRipper.dll assembly.

public interface IExportPlugin
{
    UserControl LoadUserInterface(WrProject project);
    bool SaveUserInterface(WrProject project);
    void Export(WrProject project, WrExportData data);
}

The method LoadUserInterface should return a standard .NET UserControl that displays the plugin user interface.

The method SaveUserInterface is called when the Visual Web Ripper user presses the Save button and the plugin should save any data the user has entered. The plugin should validate the entered data and return false if the data is invalid, or true if the data is valid.

The Export method is the plugin's export routine, and is called after the standard data export has completed.

The class below is an example of an exported class that implements the IExportPlugin interface.

[Export(typeof(IExportPlugin))]
public class ExportPlugin : IExportPlugin, IDisposable
{
    DatabaseConnection databaseConnectionControl;

    public UserControl LoadUserInterface(WrProject project)
    {
        databaseConnectionControl = new DatabaseConnection(project);
        return databaseConnectionControl;
    }
    
    public bool SaveUserInterface(WrProject project)
    {
        return databaseConnectionControl.Save(project);
    }
    
    public void Export(WrProject project, WrExportData data)
    {
        DataExport.Export(project, data);          
    }

    public void Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }
    
    protected virtual void Dispose(bool disposing)
    {
        if (disposing)
            if (databaseConnectionControl != null)
            {
                databaseConnectionControl.Dispose();
                databaseConnectionControl = null;
            }
    }
    ~ExportPlugin()
    {
        Dispose(false);
    }
}

Saving User Data

A plugin can save user data in the project file using the project property PluginParameters. The following example saves a database connection string in the project.

project.PluginParameters["SimpleDatabaseExport_ConnectionString"] = connectionString.Text;

The following example opens a database connection using the stored connection string.

IConnection connection = new WrSqlServerConnection(project,
     project.PluginParameters["SimpleDatabaseExport_ConnectionString"]);

Examples

The following two plugin examples have been built using Visual Studio.

Example 1

This example shows how to build a plugin that can FTP an exported CSV file to a remote server. The plugin user interface is used to configure FTP address and login details. To use this plugin, copy the compiled assembly FTPExport.dll to the following sub-folder in the Visual Web Ripper installation folder.

Plugins\Export\FTPExport

Download Visual Studio sample project

Example 2

This example shows how to build a plugin that exports data to SQL Server.
The plugin user interface is used to configure the database connection string.
To use this plugin, copy the compiled assembly SimpleDatabaseExport.dll to the following sub-folder in the Visual Web Ripper installation folder.

Plugins\Export\SimpleDatabaseExport

This plugin is designed to replace the standard data export, so the standard export target should be set to None.

Download Visual Studio sample project

Example 3

This example shows how to build a plugin that can email an exported CSV file. The plugin user interface is used to configure email server and recipient details. To use this plugin, copy the compiled assemblyEmailExport.dllto the following sub-folder in the Visual Web Ripper installation folder.

Plugins\Export\EmailExport

Download Visual Studio sample project

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 219,589评论 6 508
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,615评论 3 396
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 165,933评论 0 356
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,976评论 1 295
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,999评论 6 393
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,775评论 1 307
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,474评论 3 420
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,359评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,854评论 1 317
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 38,007评论 3 338
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,146评论 1 351
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,826评论 5 346
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,484评论 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,029评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,153评论 1 272
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,420评论 3 373
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,107评论 2 356

推荐阅读更多精彩内容

  • **2014真题Directions:Read the following text. Choose the be...
    又是夜半惊坐起阅读 9,516评论 0 23
  • 模拟LinckedList实现增删改查 ps:未考虑并发情况 链表结构优点,删除,插入数据速度快,占用内存小。
    syimo阅读 278评论 0 0
  • 唯物主义:物质决定意识。意识指导物质。迷茫是意识的一种。 短暂人生道路,前20年,靠父母荫蔽。以后的日子,无论成功...
    走在雨的缝中阅读 343评论 4 5
  • 小时候爱写日记,因为太年幼无知,所以什么都往日记里写,今天认为妈妈做的不对,就写下了几百字的抱怨。明天看到秋天家家...
    二二的天空阅读 193评论 0 0
  • 大家好,我是十八雯,我相信打开这个的宝宝们都是勤奋,努力,有责任,有担当,骨子里有种不服输的人,希望宝宝们都能坚持...
    刘阿雯阅读 333评论 0 1