Sunday, April 17, 2011

När statistiken lurar dig

Dagens offer är JM. Det vill säga det är den aktien jag har valt att leka med ikväll. Det jag tar upp idag är varken nytt eller egentligen speciellt spännande (hade det varit så här lätt att förutsäga nästa dags aktiekurs hade jag ägt många tropiska öar redan nu). Men som statistiker så lockas man ibland att undersöka de mest enkla idéer för att se om dessa idéer understöttas eller inte. En sådan idé testar jag ikväll. Den går ut på att nästa dags aktiekurs kan förutsägas genom att modellera de n senaste dagarnas aktiekurs. Dvs vi har en naiv modell som ser ut som följande:
X(t+1) = a1*X(t)+a2*X(t-1)+....+an*X(t-n+1)+brus
X är alltså vår aktiekurs och t är dagens index, dvs. t+1 är imorgon. Kort sagt X(t-1) är gårdagens aktiekurs.  Jag använder efteråt en linjär regression för att skatta parametrarna a1..an. Modellen blir testad efter konstens alla regler och alla krav på modellkvalitet är uppfyllda. Resultatet visas i figuren till höger. Jag har anpassat modellen på de 200 första dagarna och testar modellen på de efterföljande dagarna. Det ser rätt bra ut, eller hur? Modellen förklarar 98% av variationen i aktiekursen och använder endast historiska priser. Hur kan det se så bra ut? Jo för när jag evaluerar den här modellen så använder jag en single step evaluering vilket betyder att jag alltid använder observerade data för att förutsäga nästa dag.

Så vad händer om jag vill förutsäga aktiekursen n dagar framåt? Då måste jag använda mina predikterade aktiekurser som indata till nästa prediktion. Hur går det då? Jo se grafen till höger.

Observera att detta är samma modell som ovan. Det ser ju inte alls lika bra ut. Varför? Jo för att jag har byggt en funktion som i genomsnitt gissar rätt. Dvs när jag inkluderar mina predikterade värden som indata så inkluderar jag ett fel. Nästa graf illustrerar vilka värden en aktiekurs, enligt modellen, kan anta. Detta bygger på ett 95% konfidensinterval. Här kan ni se att för en given dag är det rätt stora skillnader mellan högsta och minsta kurs. Titta tex på dag 110. Där ligger prediktionen mellan 35 och 45, vilket är en väsentlig skillnad när gårdagens kurs var 40! Vad blir det? 12,5% vinst eller 12,5% förlust? Tja modellen kan inte hjälpa dig där.
Hela min poäng idag


Hela min poäng idag har handlat om att även om vi kan göra fantastiska saker med statistiken så måste vi vara försiktiga. Denna modellen som jag byggde är bara ett exempel på hur illa det kan gå om man inte tänker sig för. Någon gång i framtiden när jag har mer tid går jag igenom en lite mer rimlig modell som man kan lita mer på.

Happy investing!

Saturday, April 16, 2011

Converting mkv to mp4 without reencoding

I use my Xbox for watching movies and listening to music, and it works really well. Unfortunately Xbox does not yet support the MKV video container format which means that I have a lot of cool movies that need converting. So I surfed the net to find a solution and of course, someone had written a script. I rewrote parts of it to suit my own needs. So here it is.

#!/bin/sh

# converts mkv to mp4
[ ! -r "$1" ] && echo "Error! Cannot read $1 !" && exit 1
#[ ! -x "./neroAacEnc" ] && echo "Error! neroAacEnc does not exist or is not executable!" && exit 1
bname=`basename $1`
dname=`dirname $1`
mkvinfo "$1" | grep -i track
echo "Which track is video? (generally 1 or 2)"
read vidtrack
echo "Which track is audio? (generally 1 or 2)"
read audtrack
echo "Which track is subtitles? (generally 3 or 4)"
read subtrack
echo "What is the video frames per second (fps)?"
read vidfps
mkvextract tracks $1 ${vidtrack}:video.h264 ${audtrack}:audio.ac3
#mkvextract tracks $1 ${vidtrack}:video.h264 ${audtrack}:audio.ac3 ${subtrack}:subtitles.srt
xxd -g4 video.h264 | sed '0,/RE/s/67640033/67640029/' | xxd -r > video2.h264
mv video2.h264 video.h264
mkfifo audiodump.wav
#neroAacEnc -ignorelength -q 0.20 -if audiodump.wav -of audio.m4a & mplayer audio.ac3 -vc null -vo null -ao pcm:fast
faac -q 0.20 -o audio.m4a audiodump.wav & mplayer audio.ac3 -vc null -vo null -ao pcm:fast
rm audiodump.wav
MP4Box -fps $vidfps -add video.h264 -add audio.m4a ${dname}/${bname}.mp4
rm audio.m4a audio.ac3 video.h264


Cheers,
Dr. Mike

Saturday, March 5, 2011

Ubuntu 10.10 won't suspend fix!

The other day when I upgraded from Ubuntu 10.4 to 10.10 I noticed something horrible! My system was no longer able to suspend. To me this is one big fat no no, as I use it on a daily basis. Luckily I found a solution to the problem on Ubuntu's release notes for 10.10.
When the XHCI module is loaded for USB 3.0 operation the system cannot suspend. Manually unloading XHCI will allow suspend to complete normally. To avoid future suspend problems, the workaround is to add SUSPEND_MODULES="xhci-hcd" to /etc/pm/config.d/unload_module then the system can suspend normally.
You can find more information on Maverick Meerkat Release Notes where you will also be able to view some other notes about this release.

Tuesday, June 1, 2010

Automatic and regular backup of the MySQL database

I've recently set up a few databases for my company using MySQL. It all resides on one server and as such I would like to back up all the databases on that very server for error recovery purposes.

Now the first thing we need to know is how to make a backup of all the databases residing in the server. This is easily accomplished with an application called mysqldump whose sole purpose is to back up your databases. It's also part of the community edition of MySQL so you probably already have it installed if you are currently running MySQL.

So let's see how it works in the example below!

mysqldump -u root --single-transaction --all-databases > alldatabases.sql

This will effectively dump all your databases in SQL format to the file alldatabases.sql which you could later import in case you lose your data.

Now this is enough to backup your databases once, but you probably want to do this every day. So I wrote a small bash script to do the job. It's listed below and keeps 10 days of full backups.


#!/bin/bash

BACKUPDIR=/var/Backup/database
cd "$BACKUPDIR";
FILECOUNT=$(ls | wc -l)

echo "Backing up database....";
nice -19 mysqldump -u root -pmysecret --single-transaction \
--all-databases > alldatabases-$(date +%F).sql

if [ "$FILECOUNT" -gt 10 ]
then
OLDFILE=$(ls -tr | head -n 1)
rm -f "$OLDFILE";
echo "Removed oldest file $OLDFILE";
else
echo "We don't have 10 days of backup yet: Not removing anything.";
fi


Note that you will have to exchange the username and password in the mysqldump application call to match your own setup. Also it might be a good idea to set the BACKUPDIR variable so something more useful.

Tuesday, May 25, 2010

About selecting a number of columns from a Text file

Today I had an interesting issue. I received a really big file with lots of columns in it. I only needed a few of them though, so I set out to write a quick Perl script to do the job, but then I thought to myself: "I'd better Google before I do this since it seems like a common enough problem.". Of course someone had already written a program like that. It's called cut and can be used from your terminal. I needed columns 1 to 22 from the file so I just issued
cut -f 1-22 myfile.txt > mynewfile.txt
This made the file "mynewfile.txt" contain columns 1 to 22. You can also specify what kind of separator you want. If you don't choose any the program will assume TAB as the field separator.

Thursday, May 20, 2010

Converting an mpeg2 file to a DVD

I got an interesting question from my brother the other day. He wanted to know whether I could help him convert his wedding movie, in mpeg format, to a DVD. The reason for this was that his friends, and himself, had old DVD players that did not support anything other than plain DVD format. So this is just a small description of how I did it.

I used dvdauthor to solve this problem. Dvdauthor does just what it says: it helps you author DVD's. There is a GUI for it, but I used it from command line since I only needed the basic functionality.

No of course I couldn't apply dvdauthor directly since it requires the mpeg's to be in VOB format. So the first thing I did was to convert my mpeg file to VOB format with the following line:
ffmpeg -i videofile.mpeg -target dvd videofile.vob
Now we have a workable format in the file called "videofile.vob". This is the file we will let dvdauthor work with. In order for dvdauthor to be able to do it's job we need to let it know the basic structure of the DVD we want to create. We will describe this structure using an XML file. In principle you can just open up your favourite editor and enter the following:

<dvdauthor>
<vmgm />
<titleset>
<titles>
<pgc>
<vob file="your_video.mpg" />
</pgc>
</titles>
</titleset>
</dvdauthor>

and then save the file as dvd.xml. This will give you only on chapter, i.e., no chapters which can be rather limiting, but I didn't need anything fancy. So basically we can call dvdauthor now like this:
dvdauthor -o dvd -x dvd.xml
If you look in your directory now, you will have a new folder called dvd containing the AUDIO_TS and VIDEO_TS folders. You can now burn this to a dvd with growisofs like this:
growisofs -dvd-compat -Z yourdvddevice -dvd-video ./dvd/
Well, that did it for me. I hope you'll find it useful. Thank you and good night. :)