From 44bb2856a59be53ef5ede154a39c54a59b1cc6d0 Mon Sep 17 00:00:00 2001
From: Amaury Pouly <amaury.pouly@gmail.com>
Date: Fri, 11 Nov 2016 15:40:56 +0100
Subject: nwztools/database: add database of information on Sony NWZ linux
 players

There must be an evil genius in Sony's Walkman division. Someone who made sure
that each model is close enough to the previous one so that little code is needed
but different enough so that an educated guess is not enough.

Each linux-based Sony player has a model ID (mid) which is a 32-bit integer.
I was able to extract a list of all model IDs and the correspoding name of
the player (see README). This gives us 1) a nice list of all players (because
NWZ-A729 vs NWZ-A729B, really Sony?) 2) an easy way to find the name of player
programatically. It seems that the lower 8-bit of the model ID gives the storage
size but don't bet your life on it. The remaining bytes seem to follow some kind
of pattern but there are exceptions.

From this list, I was able to build a list of all Sony's series (up to quite
recent one). The only safe way to build that is by hand, with a list of series,
each series having a list of model IDs. The notion of series is very important
because all models in a series share the same firmware.

A very important concept on Sony's players is the NVP, an area of the flash
that stores data associated with keys. The README contains more information but
basically this is where is record the model ID, the destination, the boot flags,
the firmware upgrade flags, the boot image, the DRM keys, and a lot of other stuff.
Of course Sony decided to slightly tweak the index of the keys regularly over time
which means that each series has a potentially different map, and we need this map
to talk to the NVP driver. Fortunately, Sony distributes the kernel for all its
players and they contain a kernel header with this information. I wrote a script
to unpack kernel sources and parse this header, producing a bunch of nw-*.txt
files, included in this commit. This map is very specific though: it maps Sony's
3-letter names (bti) to indexes (1). This is not very useful without the
decription (bti = boot image) and its size (262144). This information is harder
to come by, and is only stored in one place: if icx_nvp_emmc.ko drivers, found
on the device. Fortunately, Sony distributes a number of firmware upgrade, that
contain the rootfs, than once extracted contain this driver. The driver is a
standard ELF files with symbols. I wrote a parsing tool (nvptool) that is able
to extract this information from the drivers. Using that, I produced a bunch
of nodes-nw*.txt files. A reasonable assumption is that nodes meaning and
size do not change over time (bti is always the boot image and is always
262144 bytes), so by merging a few of those file, we can get a complete picture
(note that some nodes that existed in older player do not exists anymore so
we really need to merge several ones from different generations).

The advantage of storing all this information in plain text files, is that it
now makes it easy to parse it and produce whatever format we want to use it.
I wrote a python script that parses all this mess and produces a C file and
header with all this information (nwz_db.{c,h}).

Change-Id: Id790581ddd527d64418fe9e4e4df8e0546117b80
---
 utils/nwztools/database/README | 62 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)
 create mode 100644 utils/nwztools/database/README

(limited to 'utils/nwztools/database/README')

diff --git a/utils/nwztools/database/README b/utils/nwztools/database/README
new file mode 100644
index 0000000000..62d5ca66e8
--- /dev/null
+++ b/utils/nwztools/database/README
@@ -0,0 +1,62 @@
+This file explains how the database was created an how to update it.
+
+Model list
+==========
+
+The model list (models.txt) was extract from Sony's mptapp on target. This is
+most probably the only reliable way of getting model IDs. It cannot be done
+automatically but it is easy to locate the list using a tool like IDA. It is
+basically a long list of the following structure:
+  struct model_info_t
+  {
+    const char *name;
+    uin32_t mid;
+  };
+Once identified, it is easy to copy it to a file and grep/sed/awk it to produce
+the textual list. It depends on which tool you use. I decided to keep this list
+because it is an easy format to produce and parse. For consistency, I decided
+to use upper case for the model name and lower case for mid. Keep this when
+you modify the list to keep the diff minimal.
+
+IMPORTANT NOTE: some players have more than one model ID (ie same name) !!
+
+FORMAT (models.txt): list of pairs <mid>,<name> where <name> is upper case
+human name of the player and <mid> is the lower-case hex value of the model ID.
+
+Series list
+===========
+
+The original series list was semi-automatically generated. Unfortunately, Sony
+does not use a 100% regular naming scheme. It is thus simpler to modify it by
+hand for newer models. To keep consistency, the generator script will make sure
+that the series list only refers to device in the model list and that no device
+in the model list is not refered to.
+
+FORMAT (series.txt): list of <codename>,<name>,<mid1>,<mid2>,... where <codename>
+is the (Rockbox-only) codename of the series (that should match what other tools
+use, like upgtools), always in lower case; where <name> is the humand name of the
+series, and <midX> is the list of models in the series (given by model IDs because
+name are not uniques).
+
+Advise on tooling
+=================
+
+The format of the file was carefully chosen to be easy to use and produce. It
+avoids uses spaces are separators because it breaks easily. The "," separator
+is a good match in this case and shouldn't pose a problem. In most tools, changing
+the separator is easy. For example with awk, you can use the
+  -F ","
+option, or define in the preamble with
+  BEGIN { FS = ","; }
+. Other tools have similar constructs.
+
+NVPs
+====
+
+See nvps/README
+
+gen_db.py
+=========
+
+This script generates the database (nwz_db.{c,h}) from the various textual files.
+The output must NOT be touched by hand.
-- 
cgit v1.2.3