03-20-2015, 11:36 PM
03-21-2015, 12:38 AM
I figured it was something like that
Is there a way to stop scrape.php after a certain date? similar to organize.php
Thanks for your hard work Vladimir this is awesome!
Is there a way to stop scrape.php after a certain date? similar to organize.php
Thanks for your hard work Vladimir this is awesome!
03-21-2015, 04:48 AM
I have it up and running, first results are pretty good. Way better then expected. However to actually get it running it is a pain, here are my notes:
apt-get insatll php5-curl
curl -sS https://getcomposer.org/installer | php
mv settings-dist.json settings.json
(insert twitter api keys)
#### dumper-scaper is working now #####
python dumpmon-scraper.py -s 2015-03-19
#######################################
apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose python-sklearn
#pip install scipy
mkdir train
mkdir data/raw/training
####
manually sort out 20 hash/plain/trash into data/raw/training/[hash/plain/trash]
####
php organize.php --train
####
python classify.py
####
php organize.php -s 2015-03-05 -u 2015-03-20
####
php extract.php -s 2015-03-05 -u 2015-03-20
####
find data/processed/plain/ -name "*.txt" -exec cat {} \; | sort -u > pastebin.dict.txt
apt-get insatll php5-curl
curl -sS https://getcomposer.org/installer | php
mv settings-dist.json settings.json
(insert twitter api keys)
#### dumper-scaper is working now #####
python dumpmon-scraper.py -s 2015-03-19
#######################################
apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose python-sklearn
#pip install scipy
mkdir train
mkdir data/raw/training
####
manually sort out 20 hash/plain/trash into data/raw/training/[hash/plain/trash]
####
php organize.php --train
####
python classify.py
####
php organize.php -s 2015-03-05 -u 2015-03-20
####
php extract.php -s 2015-03-05 -u 2015-03-20
####
find data/processed/plain/ -name "*.txt" -exec cat {} \; | sort -u > pastebin.dict.txt
03-21-2015, 12:00 PM
I know setting it up is a pain in the back, I'm working on creating a single executable file, so you won't have to import all the dependencies. Sadly I'm having some troubles with Twitter library, I hope I can fix it.
If there is any Python developer, he would be very appreciated
FYI I'm going to drop the idea of a GUI: since I want to create a cross-platform application, it would be a lot of trouble for such a little improvement.
Any ideas and suggestions are more than welcome!
If there is any Python developer, he would be very appreciated
FYI I'm going to drop the idea of a GUI: since I want to create a cross-platform application, it would be a lot of trouble for such a little improvement.
Any ideas and suggestions are more than welcome!
03-21-2015, 05:03 PM
I don't really care much about a gui, the current setup is actually fine. However it needs a lot more polishing like sanity checks and prequesite checks.
Also I am doing something that will streamline the learning process a bit better.
Why is the training dir in data/raw/training/ not /data/training/ ?
Also I am doing something that will streamline the learning process a bit better.
Why is the training dir in data/raw/training/ not /data/training/ ?
03-21-2015, 06:15 PM
made trainer.py and commited it to your repo
04-02-2015, 08:36 AM
Hello guys
I have just compiled the win/linux binaries, can you please try them?
https://github.com/tampe125/dump-scraper....1.0-alpha
now there is a single entry point, you simply have to type:
dumpscraper [command] [options]
available commands are:
scrape (twitter scraping)
classify -s [since] -u [until] (calculate the score and organize dumps)
extract -s [since] -u [until] (extract useful info)
the training part has been improved:
training -d/--getdata will display an interactive way to manually classify dumps
training -s/--getscore calculate the score for training data
there are no backwards incompatibilities, so you can just keep all the previous dumps.
I compiled them in Ubuntu 14.04 and Windows 7 32 bit, let me know if it works.
Please be aware that this is my first time compiling Python, so things could simply be broken
I have just compiled the win/linux binaries, can you please try them?
https://github.com/tampe125/dump-scraper....1.0-alpha
now there is a single entry point, you simply have to type:
dumpscraper [command] [options]
available commands are:
scrape (twitter scraping)
classify -s [since] -u [until] (calculate the score and organize dumps)
extract -s [since] -u [until] (extract useful info)
the training part has been improved:
training -d/--getdata will display an interactive way to manually classify dumps
training -s/--getscore calculate the score for training data
there are no backwards incompatibilities, so you can just keep all the previous dumps.
I compiled them in Ubuntu 14.04 and Windows 7 32 bit, let me know if it works.
Please be aware that this is my first time compiling Python, so things could simply be broken
04-02-2015, 06:35 PM
Windows 7 x64 Ultimate
Working fine so far. The hardest part was the Twitter App
Working fine so far. The hardest part was the Twitter App
06-27-2015, 07:22 PM
I followed the guide to set up the Twitter App & configuring the scraper, but it always throws me an error: "Twitter error: Could not authenticate you."
06-27-2015, 07:51 PM
Dumpmon, the twitter account, has been suspended that may be why.