Try it

Pavle's home page

:: Projects ::
Byte Code Transformation
Intelligent Stock ExChange
Balkan Case Challenge 2004
Recreational Center app.
Student Service app.
Stock Accounter
Search4Job
Telephone Controller
Internet Explorer Add-in
CD Manager
Dentic
Linear system calculator
Apartments Rajevac - Zlatibor
Informational Business Centre

:: Algorithms ::
Permutation generator

:: Pictures ::
Sport
Pavle Guduric
Serbia&Montenegro, Belgrade
pavle@guduric.info
         
Intelligent Stock ExChange Agency

<<Introduction>>

The idea was to create tool for Serbian stock market analysis. Since the daily stock exchange reports were presented in plain HTML, packed with poorly organized tables that contained data for transactions and companies characteristics, it was immposible to process data automaticly. None of the tools for data analysing were presented on the official stock exchange site. It was needed to manualy sort data, filter data, make calculations and interpret results at the end.

The agency is projected for collecting and analysing data from official stock market web site. It is designed to change managers, brockers and statystical analysers. It is completly automatized and because of that is based on an agent framework. Agents are in chage for operational level. They collect, analyse and store data for further use. The end user has the opportunity to access the agency over the internet. In that way he can use on-line tool for data analysys.

On the Picture 1 you can see the parts of the system. Communication and work of the agents are based on JADE environment.

Picture 1: Overview of the agency system

<< Process of collecting and analysing documents >>

Process of collecting and analysing documents is conducted in 4 phases:

  1. HTML parsing and table exraction
  2. HTML transforming into object code
  3. Searching for patterns in the tables
  4. Using table patterns to extract data

The picture of the proccess is given below (Picture 2):

Picture 2: Process of collecting and analysing documents

Two type of agents are involved in this proccess: Parsers and Analysers. Parser is responsible for the first and the second part of the proccess, and for the other two parts, Analyser is responsible.All agents must have their services registered by the Coordinator agent.

1.HTML parsing and table exraction

A number of Parsers is cloned by the Coordinator agent. Then pages are downloaded by Parsers and tables are extracted.

Input in this phase is HTML code of the page and on the output we have array of tables.

2. HTML transforming into object code

This is the part where, for the each table, extracted in the previous step, an object is created.

Input in this phase is the HTML table and the output is the object in memory.

3. Searching for patterns in the tables

A number of Analysers is cloned by the Coordinator agent. Then Analysers look in the pattern data base and if they find previously stored pattern which has the most similar characteristics as the one currently analysed, the method of data extraction, associated with that pattern, is used to extract data from the current table. But if there is no pattern stored before, an agent Plowher comes in act. His purpose is to find patterns in data that are containated in the tables. First he loads a sample which contains, for example 100, tables. Then he looks for the similar table cells that are positioned at the same places in all tables. If he can determine the existence of the same structures in all tables pattern is founded. Pattern is memorized like XML along with the rules of the data extraction for the associated table. If it is not possible to find pattern, an opportunity to do that is left for the administrator of the system. Sample picture of the interface is given below (Picture 3):

Picture 3: Administrator tool for extraction rule making

Input in this phase is the table object in memory and output is the XML pattern with rules for data extraction.

4.Using table patterns to extract data

Each pattern that is stored in the data base is used for the extraction of data from the tables, based on the rules that it contains. When the currently analysed table has pattern similar to the one from the data base, the number that represents the percent of confirmation is calculated by the Bernuli criteria.

Input in this phase is unknown table and output is the XML document with data from table.

<< Exapmle >>

At the begginig of the proccess we have the table like this :

Naziv i vrsta zemljišta-objekta

Namena zemljišta-objekta

Površina zemljišta-objekta (m2)

Građevinsko zemljište

Za građevinske objekte

8.380

Proizvodna hala

Proizvodnja

3.000

Upravna zgrada

Kanc. prostor  I  pomoćne prostorije

1.400

Garaža

Smeštaj vozila

350

Magacin gotove robe

Smeštaj gotovih proizvoda

1.980


:: Sample table with data written in Serbian language ::

At the end of the proccess we have the XML like this:

<XML xmlns="http://tempuri. org/Header_zemljiste. xsd">
<zemljiste>
<instance>
<Naziv_i_vrsta_zemljištaobjekta>
Graðevinsko zemljište</Naziv_i_vrsta_zemljištaobjekta>
<Namena_zemljištaobjekta>Za graðevinske objekte</Namena_zemljištaobjekta>
<Površina_zemljištaobjekta__m2>8. 380</Površina_zemljištaobjekta__m2>
</instance>
<instance>
<Naziv_i_vrsta_zemljištaobjekta>
Proizvodna hala</Naziv_i_vrsta_zemljištaobjekta>
<Namena_zemljištaobjekta>
Proizvodnja</Namena_zemljištaobjekta>
<Površina_zemljištaobjekta__m2>
3. 000</Površina_zemljištaobjekta__m2>
</instance>
<instance>
<Naziv_i_vrsta_zemljištaobjekta>
Upravna zgrada</Naziv_i_vrsta_zemljištaobjekta>
<Namena_zemljištaobjekta>
Kanc. prostor I pomoæne prostorije</Namena_zemljištaobjekta>
<Površina_zemljištaobjekta__m2>
1. 400</Površina_zemljištaobjekta__m2>
</instance>
<instance>
<Naziv_i_vrsta_zemljištaobjekta>
Garaža</Naziv_i_vrsta_zemljištaobjekta>
<Namena_zemljištaobjekta>
Smeštaj vozila</Namena_zemljištaobjekta>
<Površina_zemljištaobjekta__m2>
350</Površina_zemljištaobjekta__m2>
</instance>
<instance>
<Naziv_i_vrsta_zemljištaobjekta>
Magacin gotove robe</Naziv_i_vrsta_zemljištaobjekta>
<Namena_zemljištaobjekta>
Smeštaj gotovih proizvoda</Namena_zemljištaobjekta>
<Površina_zemljištaobjekta__m2>
1. 980</Površina_zemljištaobjekta__m2>
</instance>
</zemljiste>
</XML>