Data Mining vs Screen-Scraping

Info mining isn’t screen-scraping. I understand that some individuals in the room may disagree with that statement, but they’re actually two almost completely different concepts.

In a nutshell, you may state it this way: screen-scraping allows you to get information, where data mining permits you to analyze information. That’s a pretty big simplification, so I’ll complex a bit. yelp data scraper

The term “screen-scraping” comes from the old mainframe terminal days and nights where people done personal computers with green and dark screens containing only textual content. Screen-scraping was used to extract characters from the screens so that they could be analyzed. Fast-forwarding to the net world of today, screen-scraping now most commonly refers to taking out information from web sites. That is, computer programs can “crawl” or “spider” through web sites, taking out data. People often try this to build things like comparison shopping search engines, archive web pages, or simply download text to a spreadsheet so that it can be strained and analyzed.

Data exploration, on the other palm, is defined by Wikipedia as the “practice of automatically searching large stores of data for habits. ” In other words, words the data, and you’re now analyzing it to learn useful things about it. Data gold mining often involves lots of complex algorithms based upon record methods. It has not do with how you got the data in the first place. In data mining you only care about analyzing precisely already there.

The issue is that folks who can’t say for sure the term “screen-scraping” will try Googling for anything that has a resemblance to it. We include a number of these conditions on our internet site to help such folks; for example, we created web pages entitled Text Data Exploration, Automated Data Collection, Website Data Extraction, and even Web Site Ripper (I suppose “scraping” is type of like “ripping”). Thus it presents somewhat of your problem-we don’t necessarily want to perpetuate a misunderstanding (i. e., screen-scraping sama dengan data mining), but we also have to use terminology that folks will actually use.